Accessing Databases

The DataExtensions class contains extension methods for reading data from a database into a data frame, and for converting between data tables and data frames.

Reading data frames from databases

The ReadDataFrame reads a data frame from an IDataReader. It has four overloads. Since this is an extension method, the first argument is always the IDataReader to read from.

For the first overload, the second argument is optional. It is an integer that specifies the initial capacity of the data frame. For optimal performance, this value should equal the number of rows in the data frame.

The second overload has three arguments in all. Here, the second argument is a sequence of column names. Only the columns in the specified sequence will be loaded into the data frame. The third argument is once again the initial capacity of the data frame.

The third and fourth overloads take a type argument that specifies the type of the row index of the data frame. The second argument (after the IDataReader) is the name of the column that contains the row index. This is followed by (optionally) the sequence of column names to load into the data frame and the initial capacity. The last argument is a boolean value that specifies whether the key column should be dropped from the data frame. THe default is true.

Converting between data tables and data frames

The ToDataFrame method converts a DataTable to a data frame. The method is once again defined as an extension method. The first argument is always the data table to convert. The overloads of this method parallel those for reading data frames.

The first overload takes no additional arguments. It simply converts all columns in the data table to columns in a new data frame. The second overload takes one additional argument: a sequence of strings that specify the columns to include in the data frame.

The third and fourth overload let you specify the column to use as the row index. These overloads take the type of the row keys as a generic type argument. The second actual argument for both overloads is the name of the key column. This can optionally be followed by a sequence of column names to include in the data frame.

The ToDataTable method performs the conversion in the other direction. It converts a data frame to a data table. This is also an extension method, with the first argument of type IDataFrame. This means that the method works equally well for data frames, matrices, and vectors.

The method has four overloads that parallel the overloads of the ToDataFrame method. The first overload takes no additional arguments. It simply converts all columns in the data frame to columns in a new data table. The second overload takes one additional argument: a sequence of column keys that specify the columns to include in the data table. This overload takes the type of the column keys as a generic type argument, which can usually be inferred.

The third and fourth overload let you specify a column name for the row index of the data frame. The second actual argument for both overloads is the name of the key column. This can optionally be followed by a sequence of column names to include in the data frame, in which case the method also takes the type of the column keys of the data frame as a generic type argument.