Extreme Optimization™: Complexity made simple.

Numerical Components
for .NET

  • Home
  • •
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • •
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • •
  • Support
    • Frequently Asked Questions
    • QuickStart Samples
    • Sample Applications
    • Downloads
  • •
  • Blog
  • •
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Contact us
Introduction
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand ReferenceReference
  • Home
  • Documentation
  • Statistics Library User's Guide
  • Categorical Variables
  • Categorical Variables
Collapse imageExpand ImageCopy imageCopyHover image
       




Categorical Variables

Variables whose observations can take on only one of a finite set of values are called categorical variables. In the Extreme Optimization Numerical Libraries for .NET, categorical variables are implemented by the CategoricalVariable class.

Associated with each categorical variable is a scale that translates the object values into numerical values as outlined in the previous section.

Constructing categorical variables

Categorical variables can be constructed in a variety of ways. The CategoricalVariable class has six constructors that come in three groups.

The first group uses a ICollection as the source of the data. The first variant has two parameters. The first is a string that specifies the name of the variable. The second parameter is an object that implements the ICollection interface. This includes arrays, array lists, hash tables, and more. The second variant only takes one parameter: an object that implements the ICollection interface containing the data values.

This constructor automatically creates a CategoricalScale object. The levels are the different objects that appear in the collection.

C# Copy imageCopy
string[] dataArray = new string[]
    {"red", "green", "green", "red", "blue", "red",
     "blue", "blue", "green", "red", "blue", "green"};
CategoricalVariable variable1 = new CategoricalVariable(dataArray);
CategoricalVariable variable2 = new CategoricalVariable("Data", dataArray);
Visual Basic Copy imageCopy
Dim dataArray As String() = New String() _
    {"red", "green", "green", "red", "blue", "red", _
     "blue", "blue", "green", "red", "blue", "green"}
Dim variable1 As CategoricalVariable = New CategoricalVariable("Data", dataArray)
Dim variable2 As CategoricalVariable = New CategoricalVariable(dataArray)

The second group uses an integer array of level indices and a CategoricalScale object as the source of the data. The scale object represents the possible values of the categorical variable. The integer array contains the zero-based indices of the values for each observation. The first variant has three parameters. The first is a string that specifies the name of the variable. The second and third parameters are a CategoricalScale object and an integer array. The second variant only takes two parameters: a CategoricalScale object and an integer array.

C# Copy imageCopy
string[] levels = new string[] {"red", "green", "blue"};
CategoricalScale scale = new CategoricalScale(levels);
int[] indexes = new int[] {0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 2, 1};
CategoricalVariable variable3 = new CategoricalVariable("Data", scale, indexes);
CategoricalVariable variable4 = new CategoricalVariable(scale, indexes);
Visual Basic Copy imageCopy
Dim levels As String() = New String()  {"red", "green", "blue", "red"}
Dim scale As CategoricalScale = New CategoricalScale(levels)
Dim indexes As Integer() = New Integer() {0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 2, 1}
Dim variable3 As CategoricalVariable = New CategoricalVariable("Data", scale, indexes)
Dim variable4 As CategoricalVariable = New CategoricalVariable(scale, indexes)

The third pair of constructors uses a DataColumn as the source of the data. The first variant once again has two parameters. The first is a string that specifies the name of the variable. The second parameter is a DataColumn. The name of the variable is set to the Caption property of the data column.

C# Copy imageCopy
DataColumn column;
// Connect to a data source and retrieve the column from a DataTable
CategoricalVariable variable5 = new CategoricalVariable(column);
CategoricalVariable variable6 = new CategoricalVariable("Data", column);
Visual Basic Copy imageCopy
Dim column As DataColumn
' Connect to a data source and retrieve the column from a DataTable
Dim variable5 As CategoricalVariable = New CategoricalVariable("Data", column)
Dim variable6 As CategoricalVariable = New CategoricalVariable(column)

In addition, variables can be created by VariableCollection objects, by converting numerical and date/time variables (see a later section on conversions), and several other means.

Properties and Methods

The Length property returns the number of observations for the variable. The Scale property returns the CategoricalScale object that is associated with the categorical variable.

The GetValueUnfiltered(Int32) method returns the value (observation) with the specified index. The GetLevelIndex returns the level index of the specified observation. The GetLevelIndexes()()() method returns an integer array containing the level indices for all observations. A new array instance is returned on every call.

The GetEnumerator()()() method returns an IEnumerator object that can be used to iterate through the level indices of the observations.

Descriptive Statistics

Categorical variables have only a relatively small range of descriptive statistics available. The reason is that there is no numerical value to perform computations with. Moreover, several of these options disappear as well if the scale is not ordered.

Descriptive statistics properties for categorical variables
Property Description
Mode Returns the level that occurs most often. If several levels occur equally often, the first level is returned.
Median Returns the median.
Minimum Returns the smallest value.
Maximum Returns the largest value.

The median is the middle value of a sorted list of observations. If a variable has an even number of observations, then the median is the smaller of the two middle values. The examle below shows how to use some of these properties:

C# Copy imageCopy
Console.WriteLine("Median:  {0:F1}", variable1.Median);
Console.WriteLine("Mode:    {0:F1}", variable1.Mode);
Console.WriteLine("Maximum: {0:F1}", variable1.Maximum);
Visual Basic Copy imageCopy
Console.WriteLine("Median:  {0:F1}", variable1.Median)
Console.WriteLine("Mode:    {0:F1}", variable1.Mode)
Console.WriteLine("Maximum: {0:F1}", variable1.Maximum)

The mode is the only statistic available for categorical data with unordered levels.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2003-2010, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.