Variables whose observations can take on only one of a finite set of values are called categorical variables. In
the Extreme Optimization Numerical Libraries for .NET, categorical variables are implemented by the
CategoricalVariable class.
Associated with each categorical variable is a scale that translates the object values into numerical values as
outlined in the previous section.
Constructing categorical variables
Categorical variables can be constructed in a variety of ways. The CategoricalVariable class has six
constructors that come in three groups.
The first group uses a ICollection as the source of the data. The first variant has two parameters.
The first is a string that specifies the name of the variable. The second parameter is an object that implements the
ICollection interface. This includes arrays, array lists, hash tables, and more. The second variant only
takes one parameter: an object that implements the ICollection interface containing the data values.
This constructor automatically creates a CategoricalScale object. The levels are the different
objects that appear in the collection.
| C# | Copy |
|---|
string[] dataArray = new string[]
{"red", "green", "green", "red", "blue", "red",
"blue", "blue", "green", "red", "blue", "green"};
CategoricalVariable variable1 = new CategoricalVariable(dataArray);
CategoricalVariable variable2 = new CategoricalVariable("Data", dataArray);
|
| Visual Basic | Copy |
|---|
Dim dataArray As String() = New String() _
{"red", "green", "green", "red", "blue", "red", _
"blue", "blue", "green", "red", "blue", "green"}
Dim variable1 As CategoricalVariable = New CategoricalVariable("Data", dataArray)
Dim variable2 As CategoricalVariable = New CategoricalVariable(dataArray)
|
The second group uses an integer array of level indices and a CategoricalScale object as the source
of the data. The scale object represents the possible values of the categorical variable. The integer array contains
the zero-based indices of the values for each observation. The first variant has three parameters. The first is a
string that specifies the name of the variable. The second and third parameters are a CategoricalScale
object and an integer array. The second variant only takes two parameters: a CategoricalScale object and
an integer array.
| C# | Copy |
|---|
string[] levels = new string[] {"red", "green", "blue"};
CategoricalScale scale = new CategoricalScale(levels);
int[] indexes = new int[] {0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 2, 1};
CategoricalVariable variable3 = new CategoricalVariable("Data", scale, indexes);
CategoricalVariable variable4 = new CategoricalVariable(scale, indexes);
|
| Visual Basic | Copy |
|---|
Dim levels As String() = New String() {"red", "green", "blue", "red"}
Dim scale As CategoricalScale = New CategoricalScale(levels)
Dim indexes As Integer() = New Integer() {0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 2, 1}
Dim variable3 As CategoricalVariable = New CategoricalVariable("Data", scale, indexes)
Dim variable4 As CategoricalVariable = New CategoricalVariable(scale, indexes)
|
The third pair of constructors uses a DataColumn as the source of the data. The first variant once
again has two parameters. The first is a string that specifies the name of the variable. The second parameter is a
DataColumn. The name of the variable is set to the Caption property of the data column.
| C# | Copy |
|---|
DataColumn column;
CategoricalVariable variable5 = new CategoricalVariable(column);
CategoricalVariable variable6 = new CategoricalVariable("Data", column);
|
| Visual Basic | Copy |
|---|
Dim column As DataColumn
Dim variable5 As CategoricalVariable = New CategoricalVariable("Data", column)
Dim variable6 As CategoricalVariable = New CategoricalVariable(column)
|
In addition, variables can be created by VariableCollection objects, by converting numerical and
date/time variables (see a later section on conversions), and several other means.
Properties and Methods
The Length property returns the number of observations for
the variable. The Scale property returns the
CategoricalScale object that is associated with the categorical variable.
The GetValueUnfiltered(Int32) method returns the value
(observation) with the specified index. The GetLevelIndex returns the level index of the specified
observation. The GetLevelIndexes()()() method
returns an integer array containing the level indices for all observations. A new array instance is returned on every
call.
The GetEnumerator()()() method returns an
IEnumerator object that can be used to iterate through the level indices of the observations.
Descriptive Statistics
Categorical variables have only a relatively small range of descriptive statistics available. The reason is that
there is no numerical value to perform computations with. Moreover, several of these options disappear as well if the
scale is not ordered.
Descriptive statistics properties for categorical variables
| Property |
Description |
|
Mode
|
Returns the level that occurs most often. If several levels occur equally often, the first level is
returned. |
|
Median
|
Returns the median. |
|
Minimum
|
Returns the smallest value. |
|
Maximum
|
Returns the largest value. |
The median is the middle value of a sorted list of observations. If a variable has an even number of observations,
then the median is the smaller of the two middle values. The examle below shows how to use some of these
properties:
| C# | Copy |
|---|
Console.WriteLine("Median: {0:F1}", variable1.Median);
Console.WriteLine("Mode: {0:F1}", variable1.Mode);
Console.WriteLine("Maximum: {0:F1}", variable1.Maximum);
|
| Visual Basic | Copy |
|---|
Console.WriteLine("Median: {0:F1}", variable1.Median)
Console.WriteLine("Mode: {0:F1}", variable1.Mode)
Console.WriteLine("Maximum: {0:F1}", variable1.Maximum)
|
The mode is the only statistic available for categorical data with unordered levels.