Home»Documentation»Data Analysis Library User's Guide»Working with Categorical Data»Binning and Discretization

## Binning and Discretization | Extreme Optimization Numerical Libraries for .NET Professional |

It is often necessary to group numerical data into categories. The range of the data
is divided into a number of intervals, where each interval becomes a category
in a numerical scale. This type of numerical scale is implemented
by the IntervalIndex

The IntervalIndex

The first constructor takes one argument: a Double array that contains the boundaries of the intervals. The values in this array must be in ascending order, or an ArgumentException will be thrown.

double[] bounds = new double[] { 50, 62, 74, 88, 100 }; var scale1 = new IntervalIndex<double>(bounds); var scale1a = Index.CreateBins(bounds);

The second constructor takes one additional argument: a SpecialBins value that specifies which special intervals to include in the scale, if any. The possible values are as follows:

Name | Description |
---|---|

None | No special intervals are included. |

BelowMinimum | There is a special interval for values below the minimum value. |

AboveMaximum | There is a special interval for values above the maximum value. |

OutOfRange | There is a special interval for values that are outside the scale's range. |

Missing | There is a special interval for missing values. |

If BelowMinimum is included, an interval with as lower bound the smallest possible value for the element type is inserted before all other intervals. If AboveMaximum is included, an interval with as upper bound the largest possible value is added at the end. The following creates an interval index with the same boundaries as above, but with an extra interval to hold values less than 50:

var scale2 = new IntervalIndex<double>(bounds, SpecialBins.BelowMinimum); var scale2a = Index.CreateBins(bounds, SpecialBins.BelowMinimum);

The third constructor takes three arguments. The first two are the lower bound of the first interval, and the upper bound of the last interval. The third argument is the total number of intervals. This creates a scale with the specified number of intervals that are all equal in width. The fourth constructor has one additional argument: a SpecialBins value that indicates which special values should be tabulated in addition to those within the specified interval. The code below creates a scale with five intervals for values between 50 and 100:

var scale3 = new IntervalIndex<double>(50.0, 100.0, 5); var scale3a = Index.CreateBins(50.0, 100.0, 5);

The Lookup
method has a couple of additional overloads in addition to the ones defined
for standard Index

Console.WriteLine(scale3.Lookup(63.5)); // 1 double[] values = { 71.3, 39.5, 66.7, 90.4, 62.1 }; Console.WriteLine(scale3.Lookup(values)); // { 2, -1, 1, 4, 1 }

Once an interval index has been defined, it can be used to map
a vector of values to a vector of categories. The
Vector

var v = Vector.CreateRandom(100); var bins = Index.CreateBins(0.0, 1.0, 10); var vBinned1 = v.Bin(bins); var bounds = Vector.Create(9, i => (i+1) / 10.0); var vBinned2 = v.Bin(bounds, SpecialBins.BelowMinimum | SpecialBins.AboveMaximum); var vBinned3 = v.Bin(10);

Copyright Â© 2004-2023,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.