Indexes are at the heart of what makes data frames convenient and useful.
See the section on indexes
for more in-depth information.
Indexes are used to access rows and columns of a data frame.
Hierarchical indexes can be used to group rows,
and subsequently perform calculations on each group.
The row and column indexes of a data frame can be accessed through the
RowIndex and
ColumnIndex
properties. These properties are read-only.
You can create a data frame with a different row or column index using the
WithRowIndex
and
WithColumnIndexC1
methods.
Renaming columns is also possible. There are two methods:
RenameColumn
lets you rename a single column.
RenameColumns
has two overloads. The first overload takes two arguments. The first is a sequence
containing the keys to be replaced. The second is a sequence of the corresponding new keys.
The second overload also takes two arguments. The first is a predicate that
determines whether a key should be replaced. The second is a function that turns an old key
into a new key.
The RowIndex method
has a number of overloads that let you select one or more columns to use as the index.
The overloads take 1 to 3 arguments: the key(s) of the column(s) that are to make up
the index. If more than one column is selected, a hierarchical index will be created.
The element types of the columns must be passed as generic type arguments:
var df2a = df2.WithRowIndex<string,int>("state", "year");
Dim df2a = df2.WithRowIndex(Of String, Integer)("state", "year")
No code example is currently available or this language may not be supported.
let df2a = df2.WithRowIndex<string,int>("state", "year")
The total number of rows and columns are available through the
RowCount and
ColumnCount properties.
The values in a data frame are stored in vectors that make up the columns of the data frame.
Columns can be accessed by key or by ordinal position using the data frame's indexer property.
Information about the data type of each column is not encoded
in the .NET type of the data frame, and so the object returned by the indexers is of type
IVector.
You can turn this into a vector of a specific type by calling this object's
AsU method.
This method converts the untyped vector to a
VectorT.
The element type is determined by the generic type argument.
Columns can also be retrieved using the
GetColumn method,
which has a generic type argument and returns a typed vector.
This method also has a non-generic overload which returns
a vector of Double.
So, because the debt column has type Double,
the following three expressions are equivalent:
var totalDebt1 = df2["debt"].As<double>().Sum();
var totalDebt2 = df2.GetColumn<double>("debt").Sum();
var totalDebt3 = df2.GetColumn("debt").Sum();
Dim totalDebt1 = df2("debt").As(Of Double)().Sum()
Dim totalDebt2 = df2.GetColumn(Of Double)("debt").Sum()
Dim totalDebt3 = df2.GetColumn("debt").Sum()
No code example is currently available or this language may not be supported.
let totalDebt1 = df2.["debt"].As<float>().Sum()
let totalDebt2 = df2.GetColumn<float>("debt").Sum()
let totalDebt3 = df2.GetColumn("debt").Sum()
The column vectors are immutable. It is not possible to change their value in-place.
They can be used in calculations and you can make writable copies.
To retrieve multiple columns into a new data frame, use the
GetColumns method.
This method takes as its only argument a sequence of column keys.
It returns a new data frame that contains only the selected columns.
Adding and Removing Columns
While the columns of a data frame are immutable, the collection of columns itself is not.
It is possible to add columns to an existing data frame or remove columns from it.
The AddColumn
method takes two arguments. The first is the key of the new column.
An exception in thrown if a column with the same key already exists.
The second argument is a vector or collection containing the values.
Exactly how the values are added to the data frame depends on the index
of the column being added.
If the column has an index of the same type as the data frame,
then the column is aligned with the data frame's index
and the values are ordered accordingly.
If the column does not have an index or if it is of a different type, then the values are
not reordered.
Columns can be removed by key using the
RemoveColumn method,
or by ordinal index using the
RemoveColumnAt method.
In the example below, we create a vector of booleans that indicates whether
the state is an Eastern state. We then add this column to the data frame,
and then remove it:
var eastern = Vector.EqualTo(df2["state"].As<string>(), "Ohio");
Console.WriteLine("eastern =\n{0}", eastern);
df2.AddColumn("eastern", eastern);
Console.WriteLine("df2 =\n{0}", df2);
df2.RemoveColumn("eastern");
Console.WriteLine("df2 =\n{0}", df2);
Dim eastern = Vector.EqualTo(df2("state").As(Of String)(), "Ohio")
Console.WriteLine("eastern =\n0}", eastern)
df2.AddColumn("eastern", eastern)
Console.WriteLine("df2 =\n0}", df2)
df2.RemoveColumn("eastern")
Console.WriteLine("df2 =\n0}", df2)
No code example is currently available or this language may not be supported.
let eastern = Vector.EqualTo(df2.["state"].As<string>(), "Ohio")
Console.WriteLine("eastern =\n0}", eastern)
df2.AddColumn("eastern", eastern) |> ignore
Console.WriteLine("df2 =\n0}", df2)
df2.RemoveColumn("eastern") |> ignore
Console.WriteLine("df2 =\n0}", df2)
Caution |
---|
Because a data frame is a column-oriented structure, accessing values by row
is much more expensive than accessing by column, and should be avoided whenever possible.
|
The DataFrameR, C
class has a Rows
property that returns a sequence of
DataFrameRowR, C
objects that represent a row in the data frame.
The elements of a row can be indexed by key or by the position.
DataFrameRowR, C
objects have a single indexer property that can use
the name of the column or the position of the column as an index.
A single row can be retrieved through the
GetRow method.
This method takes the row key as its argument and returns a
DataFrameRowR, C object.
There is also a
GetRowAsT method
which takes a generic type argument and converts the row into a vector of the specified type.
Multiple rows can be retrieved using the
GetRows method.
This method is overloaded and can take either a sequence of keys, a sequence of ordinal indexes,
or a boolean vector as its argument. These methods return a new data frame that contains only
the selected rows.
Sometimes the key value is not exact. For example, you may want to get the row in a data frame
nearest to a certain date. The GetRowXxx methods have companion
methods that perform this task. They are called
GetNearestRow,
GetNearestRowAsT, and
GetNearestRows
and retrieve an individual row, an individual row as a vector, and a data frame containing
multiple rows, respectively. All these methods take a second argument of type
Direction that specifies
whether the nearest key should be equal to or less than (Backward)
or equal to or greater than (Forward) the specified key(s).
The code below first creates a new data frame containing only the rows
where the year is greater than 2001. It then creates another data frame
with a DateTime
index, and finds a row in two ways: first using an exact lookup,
and then using a nearest match.
var df3 = df2.GetRows(Vector.GreaterThan(df2["year"].As<int>(), 2001));
Console.WriteLine("df2(year > 2001) =\n{0}", df3);
var df4 = DataFrame.FromColumns(new Dictionary<string, object>() {
{ "first", new double[] { 11, 14, 17, 93, 55 } },
{ "second", new double[] { 22, 33, 43, 51, 69 } } })
.WithRowIndex(Index.CreateDateRange(new DateTime(2015, 4, 1), 5));
var instant = new DateTime(2015, 4, 3, 17, 11, 3);
var date = instant.Date;
var row1 = df4.GetRowAs<double>(date);
var row2 = df4.GetNearestRowAs<double>(instant, Direction.Backward);
Dim df3 = df2.GetRows(Vector.GreaterThan(df2("year").As(Of Integer)(), 2001))
Console.WriteLine("df2(year > 2001) =\n0}", df3)
Dim df4 = DataFrame.FromColumns(New Dictionary(Of String, Object)() From {
{"first", {11, 14, 17, 93, 55}},
{"second", {22, 33, 43, 51, 69}}}) _
.WithRowIndex(Index.CreateDateRange(New DateTime(2015, 4, 1), 5))
Dim instant = New DateTime(2015, 4, 3, 17, 11, 3)
Dim dateOfInstant = instant.Date
Dim row1 = df4.GetRowAs(Of Double)(dateOfInstant)
Dim row2 = df4.GetNearestRowAs(Of Double)(instant, Direction.Backward)
No code example is currently available or this language may not be supported.
let df3 = df2.GetRows(Vector.GreaterThan(df2.["year"].As<int>(), 2001))
Console.WriteLine("df2(year > 2001) =\n0}", df3)
let df4 =
let rowIndex = Index.CreateDateRange(new DateTime(2015, 4, 1), 5);
let df = DataFrame.FromColumns(
struct("first" , box [| 11.0; 14.0; 17.0; 93.0; 55.0 |]),
struct("second" , box [| 22.0; 33.0; 43.0; 51.0; 69.0 |]))
df.WithRowIndex(rowIndex)
let instant = new DateTime(2015, 4, 3, 17, 11, 3)
let date = instant.Date
let row1 = df4.GetRowAs<float>(date)
let row2 = df4.GetNearestRowAs<float>(instant, Direction.Backward)
It is also possible to select rows by specifying the rows that should be removed.
The RemoveRows method
takes a sequence of row keys and returns a new data frame with these keys removed.
The RemoveRowsWithMissingValues
method returns a new data frame with all rows that contain a missing value removed.
If no column keys are specified, all columns are checked for missing values. If one or more
column keys are specified, only the specified columns are checked.
If none of the rows contain missing values, the data frame is returned unmodified.