Indexes

An index is a collection of keys that is used to label the rows and columns of a data frame or matrix, or the elements of a vector. Once an index has been assigned to a dimension, it will propagate through calculations. For example, applying a mathematical function to a vector with an index will return a vector with the same index.

One particular feature that makes indexes very useful is automatic alignment. Any calculation that involves two or more operands that have an index will be aligned on their index. By default, an outer join is performed. If a key in one index does not appear in the other index, a missing value is returned. For example, given two vectors:

Vector A: Vector B: a: 1 b: 20 b: 2 e: 50 c: 3 c: 30 d: 4 a: 10

their sum would be

A + B: a: 11 b: 22 c: 33 d: - e: -

Indexes are used to label the rows and columns of both data frames and matrices. There are a couple of important differences. A data frame must have a row index and a column index. Moreover, the types of the keys must be known statically at compile time. A matrix may have a row index or a column index or both, but they are optional. Furthermore, the type of the keys need not be known and may even change at runtime.

Indexes are also used to enumerate the categories in a categorical variable, and to label the bins in a histogram.

Two types are used to represent indexes. The Index<T> class represents an index where the generic type argument specifies the type of the keys. This type is used to specify the indexes of a data frame. The IIndex interface represents an index where the type of the keys is not specified. This type is used to specify the indexes of matrices and vectors. In addition, the Index class contains static methods for creating and manipulating indexes.

Creating indexes

Indexes are created by calling one of the methods of the static Index class. The Create method takes a sequence of key values as its only argument and constructs an index containing these values. The order of the values is preserved.

C#
var index = Index.Create(new[] { "a", "b", "c", "d" });
var a = Vector.Create(new double[] { 1.0, 2.0, 3.0, 4.0 });
a.Index = index;
Console.WriteLine(a);

The Default method creates an index of row numbers. This index can be used when no suitable data is available to serve as an index, or when the row index is not specified. This method has two overloads. The first overload takes one argument: the total length of the index. The keys will run from 0 to one less than the length. The second overload takes two arguments: the first argument is the first row number (key value), and the second argument is the (exclusive) last row number. The keys will run from the first row number to one less than the last row number.

C#
var numbers = Index.Default(10); // 0, 1, ..., 9
var numbers2 = Index.Default(10, 20); // 10, 11, ..., 19

The CreateDateRange method constructs an index of date values. It has five overloads. The simplest overload takes two arguments: the first argument is the start date. The second argument is the number of dates in the index:

C#
var dateIndex = Index.CreateDateRange(new DateTime(2015, 4, 25), 10);
// 2015/4/25, 2015/4/26, ..., 2015/5/4

There is one more special kind of index: an interval index. This is an index whose keys consist of contiguous intervals. For example, say we want to classify persons by age group. We can create an interval index by passing an array of bin boundaries to the CreateBins method. An optional argument, of type SpecialBins, allows us to specify how to handle values that are outside the supplied boundaries. In the example below, a special bin is created for all values over 65:

C#
int[] ages = { 0, 18, 35, 65 };
var ageGroups = Index.CreateBins(ages, SpecialBins.AboveMaximum);

Interval indexes are used for binning operations and for creating histograms. See the section on Histograms for more details.

Many other operations also create indexes. For example, when you import a data frame from a text file, you can specify which column should act as the row index and the index will be created automatically. Several structural transformation operations like grouping, pivoting, and stacking result in new indexes being created to reflect the new structure.

Indexes can also be created by getting slices or multiple keys from the index.

Properties of indexes

Indexes are read-only. The indexer property takes an integer and returns the key at the specified position. An optional second argument specifies the level of the key value to return. This is only meaningful for hierarchical indexes, where the standard indexer returns the key as a tuple. A third indexer takes a list of integers and returns an index that contains only the keys at those positions, preserving the order. The GetSlice(Int32, Int32, Int32) method returns a new index that contains the key values between the specified start and end positions at the specified interval.

Several properties give more information about the index. The number of keys in the index is given by Length. The IsSorted property indicates whether the keys are sorted. This property returns true if the keys are sorted in ascending or in descending order. The IsUnique property indicates whether every key appears only once in the index.

C#
var length = index.Length; // = 4
var i2 = index[2]; // = "c"
var indexes = new int[] { 2, 1 };
var subIndex = index[indexes];
var sorted = index.IsSorted; // = true
var index2 = Index.Create(new[] { "a", "c", "b", "d" });
var sorted2 = index.IsSorted; // = false
var unique = index.IsUnique; // = true

Looking up keys

One of the primary functions of an index is to map a key to its ordinal position. The Lookup method takes a key value and returns the position. If the key was not found, -1 is returned. This method has an overload that takes a sequence of keys and returns an array of indexes. In some cases, the TryLookup method may be more convenient. This method returns whether the key was found in the index. It takes an out argument that, on return, contains the position. If the key was not found, the value of this out argument is undefined.

In the example below, we find the position of the key c in the index we created earlier. We then try to find the key e, which will fail:

C#
var position = index.Lookup("c"); // = 2
if (index.TryLookup("e", out position))
    Console.WriteLine("We shouldn't be here.");

When the index is sorted, it is possible to look up the key nearest to the provided value. This is useful, for example, when you want the value nearest a specific time in a time series. The LookupNearest method performs this operation. The second argument is a Direction value that specifies where to look for the nearest key. When the provided key is found in the index, its position is returned. If the key was not found, the position of the next key in the specified direction is returned, if it exists. If a sequence of keys is provided, an integer array containing the positions of the nearest keys is returned. There is also a TryLookupNearest which works exactly like TryLookup.

To illustrate how lookup nearest works, we first create an index of 10 dates starting 5 days before today. We then try to lookup the current time DateTime.Now. This will fail because the dates in the index are all at midnight. We then use the LookupNearest method to find the nearest key. First, we go backward, and we find today's date. Then we go forward and find the next date:

C#
var dates = Index.CreateDateRange(DateTime.Today.AddDays(-5), 10);
var now = DateTime.Now;
if (!dates.TryLookup(now, out position))
    Console.WriteLine("Exact lookup failed.");
position = dates.LookupNearest(now, Direction.Backward); // = 5

Operations on indexes

It is possible to add and remove keys from an index. This always returns a new index. The Append(T) method appends a single key at the end of the index and returns the new index. It has an overload that takes another index as its first argument and appends the entire index at the end. A second parameter is a boolean value that specifies whether to verify if all the keys in the result are unique. The Remove and RemoveAt methods remove a key by value and by position, respectively. The code below illustrates these methods.

C#
var index3 = Index.Create(new[] { "a", "b", "c", "d" });
var index4 = index3.Append("e"); // abcde
var index5 = Index.Create(new[] { "f", "g" });
var index6 = index3.Append(index5, true); // abcdfg
var index7 = index3.Remove("b"); // acd
var index8 = index3.RemoveAt(1); // acd

Indexes can also be created from other indexes. The Permute method applies the specified permutation to the index and returns the result. The Union<T> and Intersect<T> methods return an index that contains all keys that appear in at least one or both indexes.

C#
var permutation = new Permutation(new[] { 1, 2, 3, 0 });
var index9 = index3.Permute(permutation); // bcda
var index10 = Index.Create(new string[] { "a", "c", "d" });
var index11 = Index.Create(new string[] { "d", "a", "b", "e" });
var index12 = Index.Intersect(index10, index11); // ad
var index13 = Index.Union(index10, index11); // acdbe

Hierarchical Indexes

Hierarchical indexes are a convenient way to represent higher-dimensional data. The keys in a hierarchical index are tuples. For example, a two-level index will have keys of type Tuple<T1, T2>. Storage of the keys is optimized to enable fast lookup and join operations.

Creating hierarchical indexes

Two-level hierarchical indexes are created using the Create method. This method takes two arguments: a list containing the keys for the first and second level, respectively. A second method, CreateGrouped creates an index that is grouped by the first level. All entries with the same value for the first level will be contiguous. This method takes an additional out argument: a permutation from the original order of the entries to the grouped order.

C#
var hIndex = Index.Create(
    new string[] { "One", "Two", "One", "Two" },
    new string[] { "a", "b", "a", "b" });
Console.WriteLine("hIndex[1,1] = {0}", hIndex[1, 1]);
a.Index = hIndex;
Console.WriteLine(a);

Three-level hierarchical indexes are similarly created using the Create method. This method takes three arguments: a list containing the keys for the first, second, and third level, respectively. A second method, CreateGrouped creates an index that is grouped by the first two levels. This method takes an additional out argument: a permutation from the original order of the entries to the grouped order.

Operations on hierarchical indexes

Several other operations are available to aid in creating hierarchical indexes. The Nest<U> method returns a new index with one additional level. It takes one argument that is also an index. This index is repeated for every key in the original index. Each key consists of the key of the original index combined with a key from the argument.