In a fixed-width text file, each line corresponds to a record.
Each column has a fixed width, so the same field always
occupies the same range of characters in a line.
The fixed-width text format is somewhat of a legacy format.
It was commonly used in the 60's and 70's,
but faded in popularity as more flexible formats like CSV
became more common.
Reading fixed-width text files is implemented by the
FixedWidthTextFile
and FixedWidthTextStream
classes.
The FixedWidthTextOptions
class defines the options available when reading from fixed-width text files.
It inherits from TextOptions.
It has one constructor with one required argument:
an integer array containing the positions of the column breaks.
The remaining arguments are all optional and correspond to the properties of the
TextOptions class.
The array of column breaks can be retrieved using the
GetColumnBreaks
method, which returns a copy of the array.
The WithColumnBreaks(Int32)
method returns a new options object with different column breaks and leaves the
other properties unchanged.
Reading fixed-width text files
The DelimitedTextFile class
contains static methods for reading data frames, vectors, and matrices
from a file in fixed-width text format.
The ReadDataFrame
method reads a data frame from a file. The method takes two arguments.
The first argument specifies the source of the data.
This may be a string containing the path to the file, or a Stream
that has been opened for reading. If a filename is given, it may be the path
to a local file, or the uri of a resource on the Internet.
The second argument specifies the options used to read the data in the file.
This may be a FixedWidthTextOptions
object, or an integer array containing the positions of the column breaks.
In the latter case, default values are used for the other options.
Data frames read in this way always have a column index of strings (the column names)
and a row index of row numbers (64 bit signed integers). The row index
stored in the R file is essentially lost. To keep the stored index information,
the types of the row and the column keys can be passed as generic type arguments
to the ReadDataFrame
method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a data file.
Its row index is of type DateTime.
It then reads a second data frame named frame1 from a fictitious URL:
var columnBreaks = new[] { 0, 10, 20, 30 };
var options = new FixedWidthTextOptions(columnBreaks);
var df1 = FixedWidthTextFile.ReadDataFrame<DateTime, string>(
@"c:\data.txt", options);
var df2 = FixedWidthTextFile.ReadDataFrame(
"http://www.example.com/sample.txt", columnBreaks);
Dim columnBreaks = {0, 10, 20, 30}
Dim options = New FixedWidthTextOptions(columnBreaks)
Dim df1 = FixedWidthTextFile.ReadDataFrame(Of DateTime, String)(
"c:\data.txt", options)
Dim df2 = FixedWidthTextFile.ReadDataFrame(
"http://www.example.com/sample.txt", columnBreaks)
No code example is currently available or this language may not be supported.
let columnBreaks = [| 0; 10; 20; 30 |]
let options = FixedWidthTextOptions(columnBreaks)
let df1 = FixedWidthTextFile.ReadDataFrame<DateTime, string>
(@"c:\data.txt", options)
let df2 = FixedWidthTextFile.ReadDataFrame
("http://www.example.com/sample.txt", columnBreaks)
Similar methods exist for reading vectors and matrices.
The ReadVector
method reads a vector from the file. It takes one type argument that is required:
the element type of the vector to read.
The first actual argument is once again the
path to the file or Internet resource, or a stream.
The second argument is either a FixedWidthTextOptions
object, or an integer array that contains the positions of the column breaks.
The ReadMatrix
method reads a matrix from the file. It has the same arguments and overloads
as the ReadVector.
The element type must be supplied as a generic type argument.
The actual arguments are the path to the file or resource or the stream to read from,
and optionally whether the element type should match exactly.
var vector1 = FixedWidthTextFile.ReadVector<double>(
@"c:\vector.txt", options);
var complexBreaks = new int[] { 0, 10, 20, 30, 40, 50 };
var matrix1 = FixedWidthTextFile.ReadComplexMatrix<double>(
"http://www.example.com/matrix.txt", complexBreaks);
Dim vector1 = FixedWidthTextFile.ReadVector(Of Double)(
"c:\vector.txt", options)
Dim complexBreaks = {0, 10, 20, 30, 40, 50}
Dim matrix1 = FixedWidthTextFile.ReadComplexMatrix(Of Double)(
"http://www.example.com/matrix.txt", complexBreaks)
No code example is currently available or this language may not be supported.
let vector1 = FixedWidthTextFile.ReadVector<float>
(@"c:\vector.txt", options)
let complexBreaks = [| 0; 10; 20; 30; 40; 50 |]
let matrix1 = FixedWidthTextFile.ReadComplexMatrix<float>
("http://www.example.com/matrix.txt", complexBreaks)
The ReadComplexVector
and ReadComplexMatrix
methods read a complex vector and matrix from the file, respectively.
These methods are identical to their real counterparts, except that
the number of columns in the file must be twice the number of columns
in the final object. This is because the real and imaginary parts of the complex
values are stored in separate columns. So, a file storing a complex vector
should have two columns, while a file storing a complex matrix with 5 columns should
have 10 columns total.
Using Fixed-Width Data Streams
Fixed-width data streams are implemented by the
FixedWidthTextStream
class. This class has no constructors. Instead, use one of the methods of the
FixedWidthTextFile class.
Streams can be opened for reading only.
Opening files for reading
The Open(String, FixedWidthTextOptions)
method opens a file or stream for reading. This method has 4 overloads that all take
two arguments. The first is a string or a stream.
If it is a string, it is the path to the file that should be opened, or
the URI of a network or Internet resource. If it is a stream, then it specifies
the data stream that the objects should be read from.
The second argument specifies the options used to read the data in the file.
This may be a FixedWidthTextOptions
object, or an integer array containing the positions of the column breaks.
In the latter case, default values are used for the other options.
The methods for reading objects from streams are similar to those of the
FixedWidthTextFile class,
but with fewer arguments.
The ReadDataFrame
method reads a data frame from a file.
Data frames read in this way always have a column index of strings (the column names)
and a row index of row numbers (64 bit signed integers). The row index
stored in the R file is essentially lost. To keep the stored index information,
the types of the row and the column keys can be passed as generic type arguments
to the ReadDataFrame
method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a fixed width text file.
Its row index is of type DateTime.
It then reads a second data frame named frame1 from a fictitious URL:
var options = new FixedWidthTextOptions(new[] { 0, 10, 20, 30 });
using (var s1 = FixedWidthTextFile.Open("http://www.example.com/sample.txt", options))
{
var df1 = s1.ReadDataFrame<DateTime, string>();
}
Dim options = New FixedWidthTextOptions({0, 10, 20, 30})
Using s1 = FixedWidthTextFile.Open("http://www.example.com/sample.txt", options)
Dim df1 = s1.ReadDataFrame(Of DateTime, String)()
End Using
No code example is currently available or this language may not be supported.
let options = new FixedWidthTextOptions([| 0; 10; 20; 30 |])
use s1 = FixedWidthTextFile.Open("http://www.example.com/sample.txt", options)
let df1 = s1.ReadDataFrame<DateTime, string>()
Similar methods exist for reading vectors and matrices.
The ReadVectorT
method reads a vector from the file. It takes one type argument that is required:
the element type of the vector to read.
This method takes one argument which is optional: a boolean value that specifies
whether the element type of the stored vector should match the specified element type
exactly. The default is , which means
that the read operation will succeed as long as the stored element type can be
cast to the requested element type.
The ReadMatrixT
method reads a matrix from the file. It has the same arguments and overloads
as the ReadVectorT.
The element type must be supplied as a generic type argument.
The one actual arguments is optional. It specifies
whether the element type should match exactly.
var breaks = new int[] { 0, 10 };
using (var s2 = FixedWidthTextFile.Open(@"c:\vector.txt", breaks))
{
var vector1 = s2.ReadComplexVector<double>();
}
Dim breaks = {0, 10}
Using s2 = FixedWidthTextFile.Open("c:\vector.txt", breaks)
Dim vector1 = s2.ReadComplexVector(Of Double)()
End Using
No code example is currently available or this language may not be supported.
let breaks = [| 0; 10 |]
use s2 = FixedWidthTextFile.Open(@"c:\vector.txt", breaks)
let vector1 = s2.ReadComplexVector<float>(false)