Stata Files

stata® is a general purpose data analysis and statistical software package. The software has its own binary data format, usually with the extension .dta. stata files contain a single data frame. Only reading from stata files is supported. The only objects that can be read are data frames.

Reading stata files

The StataFile class contains static methods for reading a data frame from a file in .rdata format.

The ReadDataFrame method reads a data frame from a file. The method takes a single argument. This may be a string containing the path to the file, or a Stream that has been opened for reading. If a filename is given, it may be the path to a local file, or the uri of a resource on the Internet. This method returns the first data frame found in the file.

Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the stata file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.

The example below reads a data frame from a data file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:

C#
var df1 = RdataFile.ReadDataFrame<DateTime, string>(@"c:\data.rda");
var df2 = RdataFile.ReadDataFrame("http://www.example.com/sample.rda", "frame");

Reading vectors or matrices is not supported. A data frame can be readily converted to a matrix using the ToMatrix<T>(Boolean, Boolean) method. Vectors can be obtained as columns of the data frame.

Using stata Data Streams

stata data streams are implemented by the StataStream class. This class has no constructors. Instead, use one of the methods of the StataFile class. Streams can be opened for reading only.

Opening streams for reading

The Open(String) method opens a file or stream for reading. The only argument is a string or a stream. If it is a string, it is the path to the file that should be opened, or the URI of a network or Internet resource. If it is a stream, then it specifies the data stream that the objects should be read from.

The methods for reading objects from streams are similar to those of the StataFile class, but without the argument that specifies the source.

Reading data frames

The ReadDataFrame method reads a data frame from a file. This method returns the next data frame found in the file.

Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the stata file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.

The example below reads a data frame from a stata file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:

C#
using (var s1 = StataFile.Open("http://www.example.com/sample.dta"))
{
    var df1 = s1.ReadDataFrame<DateTime, string>();
}