Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Nuget packages
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Data Analysis Library User's Guide
    • Indexes
    • Data Frames
    • Data wrangling
    • Grouping and Aggregation
    • Working with Categorical Data
    • Working with Time Series Data
  • Working with Categorical Data
    • Categorical Vectors
    • Binning and Discretization
    • Histograms
  • Categorical Vectors

Categorical Vectors

Extreme Optimization Numerical Libraries for .NET Professional

Categorical vectors are vectors whose elements are taken from a limited set of values or levels. The set of possible values is called the category index. The elements are stored as integer indexes (level indexes) into the set of possible values. This makes it possible to have missing values where the element type does not have a representation of a missing value.

Categorical vectors are implemented by the CategoricalVectorT class. The ICategoricalVector interface defines the essential functionality of categorical vectors when the element type is not known.

Categorical vectors also implement the IGrouping interface, which means they can be used directly as grouping objects in aggregation operations.

Constructing Categorical Vectors

The CategoricalVectorT class does not have any constructors. Instead, use the CreateCategorical method of the Vector class. This method has four overloads. It takes a generic type argument which usually can be inferred from the actual arguments.

The first overload takes one argument: the length of the vector. This creates a categorical vector where all values are missing. The element type must be specified as the generic type argument.

C#
VB
C++
F#
Copy
var c1 = Vector.CreateCategorical<int>(5);
var thisIsTrue = c1.IsMissing(2);
Dim c1 = Vector.CreateCategorical(Of Integer)(5)
Dim thisIsTrue = c1.IsMissing(2)

No code example is currently available or this language may not be supported.

let c1 = Vector.CreateCategorical<int>(5)
let thisIsTrue = c1.IsMissing(2)

The second overload takes a list of values. The category index and level indexes are inferred from the values in the array. An optional second argument specifies the mutability of the new vector. The third overload has 2 or 3 arguments. The first is once again a list of values. The second argument is the category index. The optional third argument specifies the mutability. When a value is not found in the supplied category index, the corresponding entry in the result is marked as missing. In the example below, we create two categorical variables with the same values. Although the values are the same, the level indexes are different because the category indexes are different:

C#
VB
C++
F#
Copy
var c1 = Vector.CreateCategorical<int>(5);
var thisIsTrue = c1.IsMissing(2);
Dim c1 = Vector.CreateCategorical(Of Integer)(5)
Dim thisIsTrue = c1.IsMissing(2)

No code example is currently available or this language may not be supported.

let c1 = Vector.CreateCategorical<int>(5)
let thisIsTrue = c1.IsMissing(2)

The fourth overload also takes 2 or 3 arguments. The first argument is the category index. The second argument is a list of category indexes. The optional third argument specifies the mutability. We can create the same vector again with the following code:

C#
VB
C++
F#
Copy
var c1 = Vector.CreateCategorical<int>(5);
var thisIsTrue = c1.IsMissing(2);
Dim c1 = Vector.CreateCategorical(Of Integer)(5)
Dim thisIsTrue = c1.IsMissing(2)

No code example is currently available or this language may not be supported.

let c1 = Vector.CreateCategorical<int>(5)
let thisIsTrue = c1.IsMissing(2)

In addition, any vector can be converted to a categorical vector by calling its AsCategorical method. If the vector is already categorical, then the same vector is returned. Optionally, an IndexT can be passed to this method. The following example constructs two versions of the same vector using this method:

Properties and Methods

The CategoricalVectorT supports all standard properties and methods of vectors. Some properties and methods are unique to the class.

The CategoryIndex property returns an IndexT that contains the possible values of the elements of the vector. The LevelIndexes property returns a vector containing the position of each element in the category index. A missing value corresponds to a value of -1. The GetLevelIndex(Int32) method returns the level index of the element at the specified position.

The GetIndexes method returns a sequence of indexes of the elements that have a specific value. You can supply the actual value to look up, or the level index. The code below illustrates all these properties and methods:

C#
VB
C++
F#
Copy
var categories = c2.CategoryIndex; // { "a", "b", "d" }
var levels = c2.LevelIndexes; // [ 0, 1, 0, 1, 2 ]

var at3 = c2.GetLevelIndex(3); // 1
var indexesB = c2.GetIndexes("b").ToArray(); // [ 1, 3 ]
var indexesAt1 = c2.GetIndexes(1).ToArray(); // [ 1, 3 ]
Dim categories = c2.CategoryIndex '  "a", "b", "d" }
Dim levels = c2.LevelIndexes ' ( 0, 1, 0, 1, 2 )

Dim at3 = c2.GetLevelIndex(3) ' 1
Dim indexesB = c2.GetIndexes("b").ToArray() ' ( 1, 3 )
Dim indexesAt1 = c2.GetIndexes(1).ToArray() ' ( 1, 3 )

No code example is currently available or this language may not be supported.

let categories = c2.CategoryIndex //  "a", "b", "d" }
let levels = c2.LevelIndexes // [ 0, 1, 0, 1, 2 ]

let at3 = c2.GetLevelIndex(3) // 1
let indexesB = c2.GetIndexes("b").ToArray() // [ 1, 3 ]
let indexesAt1 = c2.GetIndexes(1).ToArray() // [ 1, 3 ]

A categorical vector is essentially a mapping from integer indexes to values contained in the category index. The WithCategoriesU method creates a new categorical vector that maps the level indexes to a different set of values. The only argument of this method is the new category index. The element type of the new index need not be the same. In the following example, we change the index of the vector we created earlier from lower-case strings to upper-case characters:

C#
VB
C++
F#
Copy
var newIndex = Index.Create(new[] { 'A', 'B', 'D' });
var C2 = c2.WithCategories(newIndex); // [ 'A', 'B', 'A', 'B', 'D' ]
var counts = c2.GetCounts();
Dim newIndex = Index.Create({"A"c, "B"c, "D"c})
Dim c2a = c2.WithCategories(newIndex) ' ( 'A', 'B', 'A', 'B', 'D' )
Dim counts = c2a.GetCounts()

No code example is currently available or this language may not be supported.

let newIndex = Index.Create([|'A'; 'B'; 'D'|])
let C2 = c2.WithCategories(newIndex) // [ 'A', 'B', 'A', 'B', 'D' ]
let counts = c2.GetCounts()

Copyright (c) 2004-2021 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2021, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.