Fixed-width Text Files

In a fixed-width text file, each line corresponds to a record. Each column has a fixed width, so the same field always occupies the same range of characters in a line. The fixed-width text format is somewhat of a legacy format. It was commonly used in the 60's and 70's, but faded in popularity as more flexible formats like CSV became more common.

Reading fixed-width text files is implemented by the FixedWidthTextFile and FixedWidthTextStream classes.

Fixed-width text options

The FixedWidthTextOptions class defines the options available when reading from fixed-width text files. It inherits from TextOptions. It has one constructor with one required argument: an integer array containing the positions of the column breaks. The remaining arguments are all optional and correspond to the properties of the TextOptions class.

The array of column breaks can be retrieved using the GetColumnBreaks() method, which returns a copy of the array. The WithColumnBreaks(Int32[]) method returns a new options object with different column breaks and leaves the other properties unchanged.

Reading fixed-width text files

The DelimitedTextFile class contains static methods for reading data frames, vectors, and matrices from a file in fixed-width text format.

The ReadDataFrame method reads a data frame from a file. The method takes two arguments. The first argument specifies the source of the data. This may be a string containing the path to the file, or a Stream that has been opened for reading. If a filename is given, it may be the path to a local file, or the uri of a resource on the Internet.

The second argument specifies the options used to read the data in the file. This may be a FixedWidthTextOptions object, or an integer array containing the positions of the column breaks. In the latter case, default values are used for the other options.

Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.

The example below reads a data frame from a data file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:

C#
var columnBreaks = new[] { 0, 10, 20, 30 };
var options = new FixedWidthTextOptions(columnBreaks);
var df1 = FixedWidthTextFile.ReadDataFrame<DateTime, string>(
    @"c:\data.txt", options);
var df2 = FixedWidthTextFile.ReadDataFrame(
    "http://www.example.com/sample.txt", columnBreaks);

Similar methods exist for reading vectors and matrices. The ReadVector method reads a vector from the file. It takes one type argument that is required: the element type of the vector to read. The first actual argument is once again the path to the file or Internet resource, or a stream. The second argument is either a FixedWidthTextOptions object, or an integer array that contains the positions of the column breaks.

The ReadMatrix method reads a matrix from the file. It has the same arguments and overloads as the ReadVector. The element type must be supplied as a generic type argument. The actual arguments are the path to the file or resource or the stream to read from, and optionally whether the element type should match exactly.

C#
var vector1 = FixedWidthTextFile.ReadVector<double>(
    @"c:\vector.txt", options);
var complexBreaks = new int[] { 0, 10, 20, 30, 40, 50 };
var matrix1 = FixedWidthTextFile.ReadComplexMatrix<double>(
    "http://www.example.com/matrix.txt", complexBreaks);

The ReadComplexVector and ReadComplexMatrix methods read a complex vector and matrix from the file, respectively. These methods are identical to their real counterparts, except that the number of columns in the file must be twice the number of columns in the final object. This is because the real and imaginary parts of the complex values are stored in separate columns. So, a file storing a complex vector should have two columns, while a file storing a complex matrix with 5 columns should have 10 columns total.

Using Fixed-Width Data Streams

Fixed-width data streams are implemented by the FixedWidthTextStream class. This class has no constructors. Instead, use one of the methods of the FixedWidthTextFile class. Streams can be opened for reading only.

Opening files for reading

The Open(String, FixedWidthTextOptions) method opens a file or stream for reading. This method has 4 overloads that all take two arguments. The first is a string or a stream. If it is a string, it is the path to the file that should be opened, or the URI of a network or Internet resource. If it is a stream, then it specifies the data stream that the objects should be read from. The second argument specifies the options used to read the data in the file. This may be a FixedWidthTextOptions object, or an integer array containing the positions of the column breaks. In the latter case, default values are used for the other options.

The methods for reading objects from streams are similar to those of the FixedWidthTextFile class, but with fewer arguments.

Reading from streams

The ReadDataFrame method reads a data frame from a file.

Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.

The example below reads a data frame from a fixed width text file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:

C#
var options = new FixedWidthTextOptions(new[] { 0, 10, 20, 30 });
using (var s1 = FixedWidthTextFile.Open("http://www.example.com/sample.txt", options))
{
    var df1 = s1.ReadDataFrame<DateTime, string>();
}

Similar methods exist for reading vectors and matrices. The ReadVector<T> method reads a vector from the file. It takes one type argument that is required: the element type of the vector to read. This method takes one argument which is optional: a boolean value that specifies whether the element type of the stored vector should match the specified element type exactly. The default is false, which means that the read operation will succeed as long as the stored element type can be cast to the requested element type.

The ReadMatrix<T> method reads a matrix from the file. It has the same arguments and overloads as the ReadVector<T>. The element type must be supplied as a generic type argument. The one actual arguments is optional. It specifies whether the element type should match exactly.

C#
var breaks = new int[] { 0, 10 };
using (var s2 = FixedWidthTextFile.Open(@"c:\vector.txt", breaks))
{
    var vector1 = s2.ReadComplexVector<double>();
}