DataFrame<R, C> Class

Represents a data frame with row and column indexes of the specified type.

Definition

Namespace: Extreme.DataAnalysis
Assembly: Extreme.Numerics (in Extreme.Numerics.dll) Version: 8.1.23
C#
public class DataFrame<R, C> : IDataFrame, 
	ISummarizable, IList<DataFrameRow<R, C>>, ICollection<DataFrameRow<R, C>>, 
	IEnumerable<DataFrameRow<R, C>>, IEnumerable
Inheritance
Object  →  DataFrame<R, C>
Implements
IDataFrame, ISummarizable, ICollection<DataFrameRow<R, C>>, IEnumerable<DataFrameRow<R, C>>, IList<DataFrameRow<R, C>>, IEnumerable

Type Parameters

R
The element type of the row index.
C
The element type of the column index.

Remarks

Use the DataFrame<R, C> type to represent a 2-dimensional collection of possibly heterogeneous data. The rows and columns are indexed by keys of type R and C, respectively. Rows and columns can be accessed through their keys or ordinal position.

Any operation that changes the rows of a data frame always returns a new data frame. Columns can be added and removed from a data frame without creating a new data frame. The values in a column are immutable, so direct modification of data values is not supported.

The columns of a data frame are stored as strongly typed vectors. However, the column types are not part of the type definition of the data frame. This means that the element type of the column must be specified when accessing a column, or an untyped vector is returned as an IVector object. Some care must be taken to specify the correct element type, or an exception may be thrown at runtime.

The fact that the data is stored in columns means that accessing the data is most efficient column-wise. When computing values based on multiple columns, it is better to write the operation in terms of the column vectors than to iterate through the rows of the data frame.

The data frame class supports many data manipulation and transformation operations, including adding and removing columns, selecting rows, structural operations like stacking and creating pivot tables, processing missing values and aggregation. By including the Extreme.DataAnalysis.Linq namespace, LINQ queries are also supported.

Hierarchical row and column indexes are supported. In order to return the proper type of the indexes, some structural operations are defined as extension methods in the static DataFrame class.

Constructors

DataFrame<R, C> Constructs a new empty data frame.

Properties

ColumnCount Gets the number of columns in the data frame.
ColumnIndex Gets or sets the column index of the data frame.
Columns Gets an enumerator over the columns of the data frame.
Item[C] Gets, sets or adds a column with the specified key.
Item[Int32] Gets the column at the specified index.
NamedColumns Gets an enumerator over pairs of column key values and the corresponding columns.
NamedRows Gets an enumerator over pairs of row key values and the corresponding rows.
RowCount Gets the number of rows in the data frame.
RowIndex Gets the row index of the data frame.
Rows Gets an enumerator over the rows of the data frame.

Methods

AddColumn(C, IVector) Adds a column to the data frame.
AddColumn<T>(C, Vector<T>) Adds a column to the data frame.
AddColumn<T>(C, IList<T>) Adds a column to the data frame.
Aggregate<T>(AggregatorGroup<T>) Applies the specified aggregator to all the columns in the data frame.
Aggregate<T>(AggregatorGroup<T>[]) Applies the specified aggregators to all the columns in the data frame.
Aggregate<T, U>(Func<Vector<T>, U>) Applies the specified aggregators to all the columns in the data frame.
Aggregate<T, U, V>(Func<Vector<T>, V>, Func<Vector<U>, V>) Applies the specified aggregators to all the columns in the data frame.
Aggregate<T, U, V, W>(Func<Vector<T>, W>, Func<Vector<U>, W>, Func<Vector<V>, W>) Applies the specified aggregators to all the columns in the data frame.
AggregateBy<R1>(C, AggregatorGroup[]) Returns a new data frame that aggregates the columns grouped by the specified column.
AggregateBy<R1>(C, ValueTuple<C, AggregatorGroup>[]) Returns a new data frame that aggregates the columns grouped by the specified column.
AggregateBy<R1>(Grouping<R1>, AggregatorGroup[]) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1>(Grouping<R1>, IDictionary<C, AggregatorGroup>) Applies the aggregators from a dictionary to selected columns in the data frame.
AggregateBy<R1>(Grouping<R1>, ValueTuple<C, AggregatorGroup>[]) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1>(IGrouping, AggregatorGroup[]) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1>(IGrouping, ValueTuple<C, AggregatorGroup>[]) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1>(IList<R1>, AggregatorGroup[]) Returns a new data frame that aggregates the columns grouped by the specified vector.
AggregateBy<R1>(IList<R1>, ValueTuple<C, AggregatorGroup>[]) Returns a new data frame that aggregates the columns grouped by the specified vector.
AggregateBy<R1, T>(C, AggregatorGroup<T>) Returns a new data frame that aggregates the columns grouped by the specified column.
AggregateBy<R1, T>(Grouping<R1>, AggregatorGroup<T>) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1, T>(IGrouping, AggregatorGroup<T>) Returns a new data frame that aggregates the columns according to the specified grouping.
AggregateBy<R1, T>(IList<R1>, AggregatorGroup<T>) Returns a new data frame that aggregates the columns grouped by the specified vector.
AggregateBy<R1, C1>(Grouping<R1>, IEnumerable<ValueTuple<C, AggregatorGroup>>, Index<C1>) Applies the specified aggregators to all the columns in the data frame.
AggregateBy<R1, T, U>(C, Func<Vector<T>, U>) Applies the specified aggregation function to the values in each column grouped by the specified grouping column.
Append(DataFrame<R, C>, Index<C>, Boolean) Combines two data frames by appending the rows of the right data frame to the rows of this data frame.
Append(DataFrame<R, C>, JoinType, Boolean) Combines two data frames by appending the rows of the right data frame to the rows of this data frame.
ApplyWith<T> Applies a matrix function to this data frame and another and returns the result as a data frame.
Clone Makes a copy of the data frame.
CombineWith<T> Combines the data frame with another data frame, aligning the two data frames and using the specified function to generate the value for common values.
Describe Returns a data frame containing descriptive statistics for each column in the data frame.
EqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
FinalizeAllows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object)
GetColumn(C) Gets the specified column as a Double vector.
GetColumn<T>(C) Gets the specified column as a strongly typed vector.
GetColumns(C[]) Returns a new data frame that contains only the specified columns.
GetColumns(IEnumerable<C>) Returns a new data frame that contains only the specified columns.
GetColumnsAt(IEnumerable<Int32>) Returns a new data frame that contains only the specified columns.
GetColumnsAt(Int32[]) Returns a new data frame that contains only the specified columns.
GetEnumerator Gets an enumerator for the rows of the data frame.
GetHashCodeServes as the default hash function.
(Inherited from Object)
GetNearestRow Gets the row nearest to the specified row key.
GetNearestRowAs<T> Gets the row at the specified row index as a vector of the specified type.
GetNearestRows Returns a new data frame that contains only the rows in the specified sequence.
GetRow Gets the row with the specified row key.
GetRowAs<T> Gets the row at the specified row index as a vector of the specified type.
GetRowAt Gets the row with the specified row key.
GetRowAtAs<T> Gets the row at the specified row index.
GetRows(R[]) Returns a new data frame that contains only the rows with keys in the specified array.
GetRows(IEnumerable<R>) Returns a new data frame that contains only the rows in the specified sequence.
GetRows(Subset) Returns a new data frame that contains only the rows in the specified subset.
GetRows(Vector<Boolean>) Returns a new data frame that contains only the rows specified by a boolean mask.
GetRows(R, R) Returns a new data frame that contains only the rows specified by a range.
GetRowsAt Returns a new data frame that contains only the rows in the specified sequence.
GetTypeGets the Type of the current instance.
(Inherited from Object)
GroupBy<R1>(C) Returns a hierarchical index consisting of the current index grouped by the specified column.
GroupBy<R1>(IList<R1>) Returns a hierarchical grouping consisting of the current index grouped by the specified grouping values.
GroupBy<R1, R2>(C, C) Returns a hierarchical grouping on two columns.
GroupedBy<R1> Returns a new data frame by grouping the index by the specified column.
Head Returns the first few rows of the data frame.
MakeCategorical(C) Marks the specified column as containing categorical data.
MakeCategorical(C[]) Marks the specified columns as containing categorical data.
MakeCategorical<T>(C, Index<T>) Marks the specified column as containing categorical data.
MakeCategoricalAt(Int32) Marks the column at the specified index as containing categorical data.
MakeCategoricalAt<T>(Int32, Index<T>) Marks the column at the specified index as containing categorical data.
Map(IList<C>, Func<IVector, IVector>, IList<C>) Applies the specified function to selected columns and returns the result in a new data frame.
Map(IList<C>, Func<IVector, IVector>, Func<C, C>) Applies the specified function to selected columns and returns the result in a new data frame.
Map<T>(Func<Vector<T>, IVector>) Applies the specified function to all columns with the specified element type and returns the result in a new data frame.
Map<T>(IList<C>, Func<Vector<T>, IVector>, IList<C>) Applies the specified function to selected columns.
Map<T>(IList<C>, Func<Vector<T>, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndAppend(C, Func<IVector, IVector>, C) Applies the specified function to selected columns and appends the results to the end of the data frame.
MapAndAppend(IList<C>, Func<IVector, IVector>, IList<C>) Applies the specified function to selected columns and appends the results to the end of the data frame.
MapAndAppend(IList<C>, Func<IVector, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndAppend<T>(C, Func<Vector<T>, IVector>, C) Applies the specified function to selected columns and appends the results to the end of the data frame.
MapAndAppend<T>(IList<C>, Func<Vector<T>, IVector>, IList<C>) Applies the specified function to selected columns and appends the results to the end of the data frame.
MapAndAppend<T>(IList<C>, Func<Vector<T>, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndInsertAfter(C, Func<IVector, IVector>, C) Applies the specified function to selected columns.
MapAndInsertAfter(IList<C>, Func<IVector, IVector>, IList<C>) Applies the specified function to selected columns and inserts the result after each mapped column in the data frame.
MapAndInsertAfter(IList<C>, Func<IVector, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndInsertAfter<T>(C, Func<Vector<T>, IVector>, C) Applies the specified function to selected columns.
MapAndInsertAfter<T>(IList<C>, Func<Vector<T>, IVector>, IList<C>) Applies the specified function to selected columns and inserts the result after each mapped column in the data frame.
MapAndInsertAfter<T>(IList<C>, Func<Vector<T>, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndReplace(C, Func<IVector, IVector>, C) Applies the specified function to selected columns.
MapAndReplace(IList<C>, Func<IVector, IVector>, IList<C>) Applies the specified function to selected columns and replaces the columns with the mapped columns.
MapAndReplace(IList<C>, Func<IVector, IVector>, Func<C, C>) Applies the specified function to selected columns.
MapAndReplace<T>(C, Func<Vector<T>, IVector>, C) Applies the specified function to selected columns.
MapAndReplace<T>(IList<C>, Func<Vector<T>, IVector>, IList<C>) Applies the specified function to selected columns and replaces the columns with the mapped columns.
MapAndReplace<T>(IList<C>, Func<Vector<T>, IVector>, Func<C, C>) Applies the specified function to selected columns.
MemberwiseCloneCreates a shallow copy of the current Object.
(Inherited from Object)
Pivot<R1, C1, T>(C, C, C) Constructs a new data frame using the specified columns as row and column indexes.
Pivot<R1, C1, T>(C, C, C, AggregatorGroup<T>) Constructs a new data frame using the specified columns as row and column indexes and aggregates the values corresponding to each row-column pair.
PivotBy<R1, C1> Returns a two-dimensional grouping on the specified columns.
RemoveColumn Removes the specified column from the data frame.
RemoveColumnAt Removes the column at the specified position from the data frame.
RemoveColumnsAt Removes the column at the specified position from the data frame.
RemoveColumnsWithMissingValues Returns a new data frame with any columns containing missing values removed.
RemoveRowIndex() Returns a new data frame that has a default numeric row index. The row index is discarded.
RemoveRowIndex(C) Returns a new data frame that has a default numeric row index. The row index is moved to a new column in the data frame.
RemoveRows(IEnumerable<R>) Returns a new data frame that has the specified rows removed.
RemoveRows(Vector<Boolean>) Returns a new data frame that has the specified rows removed.
RemoveRowsAt Returns a new data frame that has the specified rows removed.
RemoveRowsWithMissingValues() Returns a new data frame with any rows containing missing values removed.
RemoveRowsWithMissingValues(C[]) Returns a new data frame with any rows containing missing values removed.
RenameColumn(C, C) Renames the specified column.
RenameColumn(IEnumerable<C>, IEnumerable<C>) Renames the specified column.
RenameColumns Renames the columns that satisfy a condition using the specified key generator.
ReplaceMissingValues(Direction, Int32) Returns a new data frame whose columns have their missing values replaced with the next or previous non-missing value.
ReplaceMissingValues<T>(T) Returns a new data frame whose columns of the specified type have their missing values replaced with the specified value.
ReplaceMissingValues<T>(Vector<T>) Returns a new data frame whose columns of the specified type have their missing values replaced with the corresponding value from a row vector.
SelectRows<R1>(Index<R1>, Subset) Returns a new data frame that contains the selected rows and uses the specified row index.
SelectRows<R1>(Index<R1>, Int32[]) Returns a new data frame that contains the selected rows and uses the specified row index.
SelectRows<R1>(Index<R1>, Int32, Int32, Int32) Returns a new data frame that contains the selected rows and uses the specified row index.
SortBy(C) Sorts the data frame by the specified column.
SortBy(Int32) Sorts the data frame by the specified column.
SortBy(C, SortOrder) Sorts the data frame by the specified column.
SortBy(Int32, SortOrder) Sorts the data frame by the specified column.
SortByIndex() Sorts the data frame by the row index in ascending order.
SortByIndex(SortOrder) Sorts the data frame by the row index.
Stack() Returns a data frame containing all values in the data frame as row-column-value pairs.
Stack<T>(IEnumerable<C>, IEnumerable<C>, C, C) Converts a data frame from wide to long format.
Summarize() Returns a summary of the contents of the data frame using the default summary options.
Summarize(SummaryOptions) Returns a summary of the contents of the data frame using the specified options.
Tail Returns the last few rows of the data frame.
ToMatrix<T> Converts the data frame to a matrix with elements of the specified type.
ToStringReturns a string that represents the current object.
(Overrides Object.ToString())
TransformColumns(Func<IVector, IVector>) Applies the specified transformation to each column in the data frame.
TransformColumns<T, U>(Func<T, U>) Applies the specified transformation to each element of each column in the data frame.
TransformColumns<T, U>(Func<Vector<T>, Vector<U>>) Applies the specified transformation to each column in the data frame.
TransformColumns<T, U, R1>(Func<Vector<T>, Vector<U>>) Applies the specified transformation to each column in the data frame.
TryGetRow Attempts to get the row at the specified row index.
Unstack<T> Converts a data frame from long to wide format.
WithColumnIndex<C1> Returns a new data frame that relabels the columns using the specified index.
WithRowIndex<R1>(C) Returns a new data frame using the specified column as the index.
WithRowIndex<R1>(Index<R1>) Returns a new data frame that uses the specified row index.
WithRowIndex<R1, R2>(C, C) Returns a new data frame using the specified columns as a hierarchical index.
WithRowIndex<R1, R2, R3>(C, C, C) Returns a new data frame using the specified columns as a hierarchical index.

Operators

Dynamic(DataFrame<R, C>, C) Gets the specified column as a vector of Double.
DynamicAssignment(DataFrame<R, C>, C, IVector) Sets the specified column.

Extension Methods

Group<DataFrameRow<R, C>> Returns a grouping by the unique elements in a list.
(Defined by Grouping)
Group<DataFrameRow<R, C>> Returns a grouping by the unique elements in a sequence.
(Defined by Grouping)
Group<DataFrameRow<R, C>> Returns a grouping by the unique elements in a list using the specified comparer to determine equality.
(Defined by Grouping)
Group<DataFrameRow<R, C>> Returns a grouping by the unique elements in a sequence using the specified comparer to determine equality.
(Defined by Grouping)
Sum<DataFrameRow<R, C>> Computes the sum of the sequence of values.
(Defined by ArrayMath)
Sum<DataFrameRow<R, C>, U> Computes the sum of the sequence of values that are obtained by invoking a transform function on each element of the input sequence.
(Defined by ArrayMath)
ToDataTable Constructs a data table from a data frame.
(Defined by DataExtensions)
ToDataTable Constructs a data table from a data frame.
(Defined by DataExtensions)
ToDataTable<C> Constructs a data table from the specified columns a data frame.
(Defined by DataExtensions)
ToDataTable<C> Constructs a data table from the specified columns a data frame.
(Defined by DataExtensions)

See Also