Extreme Optimization > User's Guide > Statistics Library > Variable Collections > Sorting and Filtering

Extreme Optimization User's Guide

User's Guide

Up: Variable Collections Next: General Linear Models Previous: Variable Collections Contents

Observations, Sorting And Filtering

In the previous section, the VariableCollection class was introduced as the representation of a data set. This representation favors the column-oriented view: the members of the collection are the variables, usually the columns in a table. In this section, we will look at the row-oriented view. We'll also discuss sorting and filtering.

Observations

The VariableCollection class has an Observations property that returns a ObservationCollection object, a collection of Observation objects that each represent a single observation made up of several individual variables. ObservationCollection implements the generic IList<Observation> interface.

Observation objects have a single indexer property that can use the name of the variable or the numerical index of the variable as an index.

A note on performance

Accessing datasets in VariableCollection objects is most efficient through its member variables. The Observations property is supplied for the sake of convenience, but should be avoided in most instances. In particular, it is very inefficient to iterate through the observations to perform a calculation on some variables in each observation. Nearly all such operations can be performed much more efficiently by operating directly on the variables.

For example, let's say we have a VariableCollection with three numerical variables: Var1, Var2 and Var3. We want to calculate a fourth variable that contains the average of each of the three variables for each observation. A naive implementation would be as follows:

double[] avg = new double[col.Observations.Count];
int i = 0;

foreach(Observation o in col.Observations)
{
   avg[i] = (double)o[0] + (double)o[1] + (double)o[2]) / 3.0;
   i++;
}
NumericalVariable Avg = new NumericalVariable(avg);

However, the most compact and efficient expression is simply:

NumericalVariable Avg = (Var1 + Var2 + Var3) / 3

Sorting

The data in a VariableCollection can be sorted by the values of one or more variables. For each variable, it can be specified whether the result should be in ascending or descending order.

Sort operations are always performed on the VariableCollection, through its Sort method, which is overloaded. Variables can be sorted individually only if they are not a member of a VariableCollection. Attempting to do so throws an exception.

The first overload takes as its only argument the name of the variable by which the data is to be sorted. This variable must be a member of the collection, and must be sortable. The data is always sorted in ascending order.

The second overload also takes the name of a variable as its first argument. As its second argument it has a SortOrder value, that lets you specify whether the data should be sorted in ascending or descending order.

Advanced sorting

The third overload of the Sort method takes a single parameter: a CollectionSortOrder object, that allows you to specify the sort order in detail.

CollectionSortOrder objects can be built in one operation or incrementally. It have four constructors which come in two pairs. The first constructor takes an array of variables, and an array of SortOrder values. When the CollectionSortOrder is used, the data will be sorted by the variables in the order specified by the corresponding SortOrder value. All variables must be members of the same VariableCollection.

A second constructor takes three arguments. The first argument is a VariableCollection that specifies the collection that is to be sorted. The second argument is a String array containing the names of the variables to sort on. The third parameter is once again an array of SortOrder values. These two constructors allow you to specify the sort order in one step.

The third constructor has two arguments. The first is a Variable that specifies the primary variable to sort on. The second parameter is a SortOrder value. The fourth constructor has three arguments. The first is a VariableCollection, the second the name of the variable to sort on, and the third the SortOrder.

Once a CollectionSortOrder with at least one variable has been created, the Then method can be used to define further sort fields. The method has two overloads, both with two arguments. One takes a Variable as its first argument and the SortOrder as the second argument. The second overload takes the name of the variable as its first argument. Because the variables must all be members of the same collection, and one variable is already defined, the collection is already known and need not be specified.

Calling the Sort method on a VariableCollection with a CollectionSortOrder object will sort the observations according to the fields defined earlier.

Filtering

It is often necessary to perform calculations on a subset of data, based on certain criteria. Filtering is done through Filter objects. A Filter is an ordered set of indexes of observations. When a filter is applied to a VariableCollection, only the observations whose index appears in the filter are used. A filter can only be applied to a variable or variable collection with the same total number of observations.

The Filter class has two constructors. Each takes as its first argument the total length of the filter. The first constructor takes an array of integers as its second argument. This array specifies indexes of the observations that are included in the filter. The second constructor constructs a filter that spans a range of indexes. It takes two additional integer arguments: the first index and the (exclusive) last index to be included in the filter.

There are two more ways to create filters. First, filters can be combined using set operations. The order of the observations is undefined in this case. The Filter class has static (Shared in Visual Basic) methods for this purpose: Union, Intersection and Complement. For convenience, corresponding operators (|, & and ~) have been defined for languages that support them.

Finally, the most common way to create a filter is to create one from a variable. Each variable exposes a Filters property that exposes methods that can be used to create Filter objects. For example, the IsEqualTo method returns a filter for those observations that equal a specific value or the corresponding value in another variable.

Once a Filter object has been created, assigning it to a VariableCollection's Filter property applies the filter to the observations. Any operations on variables in the collection will use only observations that match the filter. Setting the Filter property to null removes the filter.

Combining Sorting and Filtering

In general, sorting and filtering are orthogonal or independent operations. This means that, when a filter is applied to sorted data, the sort order will be maintained. Likewise, applying a sort order to filtered data will leave the data filtered.

Up: Variable Collections Next: General Linear Models Previous: Variable Collections Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army