Data Files and Data Streams

The API for the data access classes is modeled after the File and Stream classes in the System.IO namespace of the .NET Base Class Libraries. The File class contains static methods for reading and writing to files in a single call. It also has static methods for opening files for reading or writing. These methods return streams. The Stream classes contain methods for reading and writing individual values, and allow more fine-grained control.

Likewise, for each file format we have a File-like class (DelimitedTextFile, RdataFile, MatlabFile, and so on) and a corresponding Stream-like class DelimitedTextStream, RdataStream, MatlabStream, and so on). The table below lists the classes for each format:

File format

"File" class

"Stream" class

Delimited text files

DelimitedTextFile

DelimitedTextStream

Fixed width text files

FixedWidthTextFile

FixedWidthTextStream

Matrix Market files

MatrixMarketFile

MatrixMarketStream

JSON files

JsonFile

JsonStream

Matlab® files

MatlabFile

MatlabStream

R files (.rdata)

RdataFile

RdataStream

R files (.rds)

RdsFile

RdsStream

stata® files

StataFile

StataStream

Data file classes

Each file format has a corresponding class that contains static methods that perform an operation in a single call. We will use R files (with extension .rdata or .rda) as an example.

The methods defined by these classes fall into 3 general categories: reading objects, writing objects, and opening files or streams. For example, the ReadDataFrame method reads the item stored in a .rdata file into a data frame.

Data streams

The stream classes all inherit from a common base class, DataStream. There may also be an Options class that lets you specify details for a specific file format. Streams are created using one of the methods of the corresponding File class.

Some file formats support one object per file, while others may contain multiple named objects. Examples of the latter are: R files and Matlab files. For these file formats, the stream class inherits from a specialized class: CompositeDataStream<TObject>. This class takes one generic type argument: the type of the objects that are stored in the file.