Extreme Optimization >
User's Guide >
Statistics Library >
Variable Collections >
Sorting and Filtering
Extreme Optimization User's Guide
User's Guide
Up: Variable Collections Next: General Linear Models Previous: Variable Collections Contents
Observations, Sorting And Filtering
In the previous section, the VariableCollection
class was introduced as the representation of a data set. This representation
favors the column-oriented view: the members of the collection are the
variables, usually the columns in a table. In this
section, we will look at the row-oriented view. We'll also discuss sorting
and filtering.
Observations
The VariableCollection
class has an Observations property
that returns a ObservationCollection
object, a collection of Observation objects
that each represent a single observation made up of several individual
variables. ObservationCollection implements the
generic IList<Observation> interface.
Observation objects have a single indexer property that can
use the name of the variable or the numerical index of the variable as an index.
A note on performance
Accessing datasets in VariableCollection objects is most
efficient through its member variables. The Observations property
is supplied for the sake of convenience, but should be avoided in most
instances. In particular, it is very inefficient to iterate through the
observations to perform a calculation on some variables in each
observation. Nearly all such operations can be performed much
more efficiently by operating directly on the variables.
For example, let's say we have a VariableCollection with three
numerical variables: Var1, Var2 and Var3.
We want to calculate a fourth variable that contains the average of each of the
three variables for each observation. A naive implementation would be as
follows:
double[] avg = new double[col.Observations.Count];
int i =
0;
foreach(Observation o in
col.Observations)
{
avg[i] = (double)o[0] + (double)o[1] +
(double)o[2]) / 3.0;
i++;
}
NumericalVariable
Avg = new NumericalVariable(avg);
However, the most compact and efficient expression is simply:
NumericalVariable Avg = (Var1 + Var2 + Var3)
/ 3
Sorting
The data in a VariableCollection can be sorted by the values of
one or more variables. For each variable, it can be specified whether the result
should be in ascending or descending order.
Sort operations are always performed on the VariableCollection,
through its Sort method,
which is overloaded. Variables can be sorted individually only if they are not a
member of a VariableCollection. Attempting to do so throws an
exception.
The first overload takes as its only argument the name of the variable by
which the data is to be sorted. This variable must be a member of the
collection, and must be sortable. The data is always sorted in ascending
order.
The second overload also takes the name of a variable as its first argument.
As its second argument it has a SortOrder value,
that lets you specify whether the data should be sorted in ascending or
descending order.
Advanced sorting
The third overload of the Sort method
takes a single parameter: a CollectionSortOrder
object, that allows you to specify the sort order in detail.
CollectionSortOrder objects can be built in
one operation or incrementally. It have four constructors which come
in two pairs. The first constructor takes an array of variables, and an array of
SortOrder values. When the CollectionSortOrder is
used, the data will be sorted by the variables in the order specified by the
corresponding SortOrder value. All variables must be members
of the same VariableCollection.
A second constructor takes three arguments. The first argument is a
VariableCollection that specifies the collection that is to be
sorted. The second argument is a String array containing the names
of the variables to sort on. The third parameter is once again an array of
SortOrder values. These two constructors allow you to specify
the sort order in one step.
The third constructor has two arguments. The first is a
Variable that specifies the primary variable to sort on. The
second parameter is a SortOrder value. The fourth constructor
has three arguments. The first is a VariableCollection, the
second the name of the variable to sort on, and the third the
SortOrder.
Once a CollectionSortOrder with at least one variable has been
created, the Then
method can be used to define further sort fields. The method has two overloads,
both with two arguments. One takes a Variable as its first argument
and the SortOrder as the second argument. The second overload takes
the name of the variable as its first argument. Because the variables must all
be members of the same collection, and one variable is already defined, the
collection is already known and need not be specified.
Calling the Sort method
on a VariableCollection with
a CollectionSortOrder object will sort the observations
according to the fields defined earlier.
Filtering
It is often necessary to perform calculations on a subset of data, based on
certain criteria. Filtering is done through Filter
objects. A Filter is an ordered set of indexes of
observations. When a filter is applied to a VariableCollection,
only the observations whose index appears in the filter are used. A filter can
only be applied to a variable or variable collection with the same total number
of observations.
The Filter class has two constructors. Each takes as its first
argument the total length of the filter. The first constructor takes an array of
integers as its second argument. This array specifies indexes of the
observations that are included in the filter. The second constructor constructs
a filter that spans a range of indexes. It takes two additional integer
arguments: the first index and the (exclusive) last index to be included in the
filter.
There are two more ways to create filters. First, filters can be combined
using set operations. The order of the observations is undefined in this case.
The Filter class has static (Shared in Visual Basic) methods for
this purpose: Union,
Intersection
and Complement.
For convenience, corresponding operators (|, & and ~) have been defined for
languages that support them.
Finally, the most common way to create a filter is to create one from a
variable. Each variable exposes a Filters
property that exposes methods that can be used to create Filter
objects. For example, the IsEqualTo
method returns a filter for those observations that equal a specific value or the
corresponding value in another variable.
Once a Filter object has been created, assigning it to a
VariableCollection's Filter
property applies the filter to the observations. Any operations on variables in
the collection will use only observations that match the filter. Setting
the Filter property to null removes the filter.
Combining Sorting and Filtering
In general, sorting and filtering are orthogonal or independent operations.
This means that, when a filter is applied to sorted data, the sort order will be
maintained. Likewise, applying a sort order to filtered data will leave the data
filtered.
Up: Variable Collections Next: General Linear Models Previous: Variable Collections Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation