In a delimited text file, each line contains a record.
The columns are separated by a delimiter character.
In cases where a field may contain the delimiter character,
it may be quoted or escaped.
The two most common variants of delimited text formats are
CSV (Comma Separated Values) and TSV (Tab Separated Values).
In CSV files, columns are separated by a comma.
Fields that contain commas or newline characters are quoted
using double quote characters. When a quoted field itself
contains double quote characters, they are replaced with
two successive double quotes. In TSV files, the tab character
is used as the column delimiter. Tab characters are not allowed
inside fields in standard TSV format.
Reading and writing delimited text files is implemented by the
DelimitedTextFile
and DelimitedTextStream
classes.
The DelimitedTextOptions
class defines the options available when reading from delimited text files.
It inherits from TextOptions,
and has several properties in addition to those defined in the
TextOptions class.
These are listed in the table below:
Options specific to delimited text files
Property | Description |
---|
ColumnDelimiter | The character to use to separate columns. The default is a comma. |
EndOfLine | The string to use to end a line. The default is a carriage return plus line feed character. |
ColumnDelimiter | The character to use to separate columns. The default is a comma (,). |
Quote | A QuoteUsage
value that specifies when fields should be quoted. The default is
AsNeededForColumnType. The possible options
are listed later in this section.
|
QuoteCharacter | The character to use when quoting fields. The default is a double quote ("). |
QuoteEscapeMethod | A QuoteEscapeMethod
value that specifies how to handle quote characters inside a quoted field.
The default is double.
|
EscapeCharacter |
The character to use when escaping a quote inside a quoted field.
Any character following the escape character is assumed to be part of the field.
The default is the backslash character (\).
This value is ignored unless
QuoteEscapeMethod
is EscapeCharacter.
|
The Quote
property specifies when a field should be quoted. The possible values are listed
below:
Possible values for the P:Extreme.Data.Text.DelimitedTextOptions.Quote
property
Value | Description |
---|
Never | Fields are never quoted. This is the default for
tab separated values. |
AsNeeded |
Every field of every row is scanned individually for
the column delimiter, end of line characters,
the quote character, and the escape character.
If the string representation of the field value contains
such a character, then the field is quoted.
|
AsNeededForColumnType |
A determination is made for each column type whether
its values may need to be quoted. If so, the field
is always quoted. This is the default for CSV.
|
Always |
All fields are quoted.
|
The value is used when writing files, but also affects reading delimited text files.
Being aware of the conditions under which fields may be quoted
can speed up the processing of the text.
Sometimes a quoted field will contain the quote character itself.
There are two ways to deal with this situation. Either the
quote character is doubled (the default), or an escape character is used.
Several predefined options objects have been defined. The
Csv
field of the DelimitedTextOptions
class defines the options for standard CSV files, where fields are delimited
by commas and non-numeric columns quoted by double quotes.
The CsvWithoutHeader
field is similar but omits column headers.
Similarly, the Tsv
and TsvWithoutHeader
fields define the options for standard tab-delimited (TSV) files, with or without headers.
The CsvForCulture(CultureInfo, Boolean)
method returns an options object tailored to a specific culture.
If a comma is used for the decimal point or as the thousands separator,
then a semi-colon is used as the column delimiter. This method has two arguments.
The first is a CultureInfo
object. The second is optional: a boolean value that indicates whether the data
includes column headers. The default is .
Reading delimited text files
The DelimitedTextFile class
contains static methods for reading data frames, vectors, and matrices
from a file in delimited text format.
The ReadDataFrame
method reads a data frame from a file. The method takes two arguments.
The first argument specifies the source of the data.
This may be a string containing the path to the file, or a Stream
that has been opened for reading. If a filename is given, it may be the path
to a local file, or the uri of a resource on the Internet.
The second argument is a DelimitedTextOptions
object. It is optional. If it is omitted or , standard
CSV format is assumed.
Data frames read in this way always have a column index of strings (the column names)
and a row index of row numbers (64 bit signed integers). The row index
stored in the R file is essentially lost. To keep the stored index information,
the types of the row and the column keys can be passed as generic type arguments
to the ReadDataFrame
method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a CSV file.
Its row index is of type DateTime.
It then reads a second data frame from a fictitious URL:
var df1 = DelimitedTextFile.ReadDataFrame<DateTime, string>(@"c:\data.csv");
var df2 = DelimitedTextFile.ReadDataFrame(
"http://www.example.com/sample.tsv", DelimitedTextOptions.Tsv);
Dim df1 = DelimitedTextFile.ReadDataFrame(Of DateTime, String)("c:\data.csv")
Dim df2 = DelimitedTextFile.ReadDataFrame(
"http://www.example.com/sample.tsv", DelimitedTextOptions.Tsv)
No code example is currently available or this language may not be supported.
let df1 = DelimitedTextFile.ReadDataFrame<DateTime, string>(@"c:\data.csv")
let df2 = DelimitedTextFile.ReadDataFrame
("http://www.example.com/sample.tsv", DelimitedTextOptions.Tsv)
Similar methods exist for reading vectors and matrices.
The ReadVector
method reads a vector from the file. It takes one type argument that is required:
the element type of the vector to read.
The first actual argument is once again the
path to the file or Internet resource, or a stream.
The second argument is either a DelimitedTextOptions
object, or an integer array that contains the positions of the column breaks.
The ReadMatrix
method reads a matrix from the file. It has the same arguments and overloads
as the ReadVector.
The element type must be supplied as a generic type argument.
The actual arguments are the path to the file or resource or the stream to read from,
and optionally whether the element type should match exactly.
var vector1 = DelimitedTextFile.ReadVector<double>(@"c:\vector.csv");
var culture = CultureInfo.GetCultureInfo("de-DE");
var options = DelimitedTextOptions.CsvForCulture(culture);
var matrix1 = DelimitedTextFile.ReadMatrix<double>(
"http://www.example.com/german.csv", options);
Dim vector1 = DelimitedTextFile.ReadVector(Of Double)("c:\vector.csv")
Dim culture = CultureInfo.GetCultureInfo("de-DE")
Dim options = DelimitedTextOptions.CsvForCulture(culture)
Dim matrix1 = DelimitedTextFile.ReadMatrix(Of Double)(
"http://www.example.com/german.csv", options)
No code example is currently available or this language may not be supported.
let vector1 = DelimitedTextFile.ReadVector<float>(@"c:\vector.csv")
let culture = CultureInfo.GetCultureInfo("de-DE")
let options = DelimitedTextOptions.CsvForCulture(culture)
let matrix1 = DelimitedTextFile.ReadMatrix<float>
("http://www.example.com/german.csv", options)
The ReadComplexVector
and ReadComplexMatrix
methods read a complex vector and matrix from the file, respectively.
These methods are identical to their real counterparts, except that
the number of columns in the file must be twice the number of columns
in the final object. This is because the real and imaginary parts of the complex
values are stored in separate columns. So, a file storing a complex vector
should have two columns, while a file storing a complex matrix with 5 columns should
have 10 columns total.
Writing delimited text files
The Write
method is used to write one or more data frames, vectors, or matrices to a file.
The method has many overloads.
The first argument always specifies the destination in one of two ways.
It can be a string that contains the path to the file. If the file exists,
it is overwritten. If it doesn't exist, then it is created.
Alternatively, the destination can be specified as a
Stream.
The second argument always specifies the object(s) to be written.
This can be a single data frame, matrix, or vector.
It can also be a sequence of data frames, matrices, or vectors,
or a dictionary that maps names to objects.
The third argument is a
DelimitedTextOptions
object that specifies how the data should be written.
This argument is optional. If omitted, standard CSV format is used.
In the example code below, we write a data frame to a file,
and then a matrix to a stream.
DelimitedTextFile.Write(@"c:\data.csv", df1);
using (var stream = File.OpenWrite(@"c:\output.csv"))
{
DelimitedTextFile.Write(stream, matrix1);
}
DelimitedTextFile.Write("c:\data.csv", df1)
Using stream = File.OpenWrite("c:\output.csv")
DelimitedTextFile.Write(stream, matrix1)
End Using
No code example is currently available or this language may not be supported.
DelimitedTextFile.Write(@"c:\data.csv", df1)
use stream = File.OpenWrite(@"c:\output.csv")
DelimitedTextFile.Write(stream, matrix1)
Using Delimited Text Data Streams
Delimited data streams are implemented by the
DelimitedTextStream
class. This class has no constructors. Instead, use one of the methods of the
DelimitedTextFile class.
Streams can be opened for reading only.
Opening files for reading
The Open(String, DelimitedTextOptions)
method opens a file or stream for reading. This method has two overloads
that take two arguments. The first is a string or a stream.
If it is a string, it is the path to the file that should be opened, or
the URI of a network or Internet resource. If it is a stream, then it specifies
the data stream that the objects should be read from.
The second argument specifies the options used to read the data in the file,
and is of type DelimitedTextOptions.
This argument is optional. If it is omitted or ,
standard CSV format is assumed.
The methods for reading objects from streams are similar to those of the
DelimitedTextFile class,
but with fewer arguments.
The ReadDataFrame
method reads a data frame from a file.
Data frames read in this way always have a column index of strings (the column names)
and a row index of row numbers (64 bit signed integers). The row index
stored in the R file is essentially lost. To keep the stored index information,
the types of the row and the column keys can be passed as generic type arguments
to the ReadDataFrame
method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a fixed width text file.
Its row index is of type DateTime.
using (var s1 = DelimitedTextFile.Open("http://www.example.com/sample.csv"))
{
var df1 = s1.ReadDataFrame<DateTime, string>();
}
Using s1 = DelimitedTextFile.Open("http://www.example.com/sample.csv")
Dim df1 = s1.ReadDataFrame(Of DateTime, String)()
End Using
No code example is currently available or this language may not be supported.
use s1 = DelimitedTextFile.Open("http://www.example.com/sample.csv")
let df1 = s1.ReadDataFrame<DateTime, string>()
Similar methods exist for reading vectors and matrices.
The ReadVectorT
method reads a vector from the file. It takes one type argument that is required:
the element type of the vector to read.
This method takes one argument which is optional: a boolean value that specifies
whether the element type of the stored vector should match the specified element type
exactly. The default is , which means
that the read operation will succeed as long as the stored element type can be
cast to the requested element type.
The ReadMatrixT
method reads a matrix from the file. It has the same arguments and overloads
as the ReadVectorT.
The element type must be supplied as a generic type argument.
The one actual arguments is optional. It specifies
whether the element type should match exactly.
using (var s2 = DelimitedTextFile.Open(@"c:\vector.csv"))
{
var vector1 = s2.ReadVector<double>();
}
Using s2 = DelimitedTextFile.Open("c:\vector.csv")
Dim vector1 = s2.ReadVector(Of Double)()
End Using
No code example is currently available or this language may not be supported.
use s2 = DelimitedTextFile.Open(@"c:\vector.csv")
let vector1 = s2.ReadVector<float>()
Opening streams for writing
There are two methods that can be used to create an R data stream for writing.
The Create(String, Boolean, Boolean)
method opens a file for writing. The only argument is a string that
is the path to the file that should be opened. If the file exists, its contents
are destroyed. If the file does not exist, it is created.
The optional second argument is a boolean value that specifies whether
the data should be compressed. The default is .
The optional third argument is also a boolean value that specifies
whether the data should be written out in human-readable ASCII format.
The default is .
The Append(Stream, Boolean, Boolean)
method opens a stream using an existing writable stream.
The first argument is the stream to write the objects to.
The second and third arguments are optional. They are boolean values that
specify whether the data should be compressed, and whether the data should
be written in ASCII format.
The Write
method is used to write a vector or matrix to a file in Matrix Market format.
The method has many overloads.
The first argument always specifies the object(s) to be written.
This can be a vector or a matrix.
Both real and complex vectors and matrices are supported.
The following code creates a new CSV file,
and writes a matrix to it:
using (var stream = DelimitedTextFile.Create(@"c:\data.csv"))
{
stream.Write(matrix1);
}
Using stream = DelimitedTextFile.Create("c:\data.csv")
stream.Write(matrix1)
End Using
No code example is currently available or this language may not be supported.
use stream = DelimitedTextFile.Create(@"c:\data.csv")
stream.Write(matrix1)