Histograms | Extreme Optimization Numerical Libraries for .NET Professional |

A histogram is a table used to tally the frequency of data.
Each data value is mapped to a bin.
The histogram itself is just a vector of real numbers labeled
by the categories, the bin index.
The Histogram

There are two basic ways to create a histogram: you can create an empty histogram ready to receive data to tally, or you can create a histogram from a data source that has the data tallied.

Empty histograms are created using one of the overloads of the
Histogram

The first overload takes one or two arguments. The first is a list of boundaries for the bins. The optional second argument is a SpecialBins value that specifies whether to create bins for values smaller than the lowest bound or larger than the highest bound. The following table lists the possible values:

Name | Description |
---|---|

None | No special bins are included. |

BelowMinimum | There is a special bin for values below the scale's minimum value. |

AboveMaximum | There is a special bin for values above the scale's maximum value. |

If the BelowMinimum bin is included, this bin is the first bin in the collection. If the AboveMaximum bin is included, it is the last bin in the collection. The following creates two empty histograms. The second has a bin for values smaller than 50:

double[] bounds = new double[] { 50, 62, 74, 88, 100 }; var histogram1 = Histogram.CreateEmpty(bounds); var histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum);

The second overload takes three arguments. The first two are the lower bound of the lowest bin, and the upper bound of the highest bin. The third argument is the total number of bins. This creates an empty histogram with the specified number of bins that are all equal in width. An optional fourth argument is once again a SpecialBins value that indicates which special values should be tabulated in addition to those within the specified interval. The code below creates a histogram with five bins for values between 50 and 100:

var histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5); var bins = new IntervalIndex<double>(50.0, 100.0, 5); var histogram4 = Histogram.CreateEmpty(bins);

The third overload takes 1 argument: the
Index

var index = Index.Create(new[] { "High", "Medium", "Low" }); var histogram5 = Histogram.CreateEmpty(index);

The Histogram
class defines an extension method,
CreateHistogram
that transforms the data in a vector into a histogram.
This method has many overloads which mirror to some degree the overloads of the
CreateEmpty

For categorical data, the overload takes as its only argument a categorical vector. The method returns a count of each value in the vector's category index:

var data = Vector.CreateCategorical( new[] { "High", "Low", "High", "High", "Medium", "Low" }); var histogram4 = data.CreateHistogram();

Three more overloads work on continuous data. Each of these take
a list of values as their first argument. This may be a vector, an
array, or any other type that implements
IList

var values = new double[] {62.0, 77.0, 61.0, 94.0, 75.0, 82.0, 86.0, 83.0, 64.0, 84.0, 68.0, 82.0, 72.0, 71.0, 85.0, 66.0, 61.0, 79.0, 81.0, 73.0}; var histogram1 = values.CreateHistogram(50.0, 100.0, 5);

Another overload takes one additional argument: an
IntervalIndex

var bins = Index.CreateBins(50.0, 100.0, 5); var histogram2 = values.CreateHistogram(bins);

The last overload is like the previous one, but takes an additional vector argument that specifies weights for the values. The bin for each value will be incremented by the corresponding weight instead of the value 1:

var weights = Vector.CreateRandom(20); var histogram3 = values.CreateHistogram(bins, weights);

There are three ways to set the totals for the bins in a histogram.

The first way is to use the Increment method. This method takes one or two arguments. The first argument is the number to tabulate. The second argument is an optional weight. If no weight is specified, it is assumed to be 1. This method increments the total of the bin that contains the first argument by 1 or the weight from the second argument.

histogram1.Increment(83.0); histogram1.Increment(78.0, 2.5); histogram5.Increment("High"); histogram5.Increment("Medium", 4.4);

The second way is to use the Tabulate method. This method tabulates the data specified in its first argument. This can be any list of values. An optional second argument specifies the weight for each data value. This argument is of the same type as the first argument.

var data = new double[] {62, 77, 61, 94, 75, 82, 86, 83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73}; histogram2.Tabulate(data);

Finally, you can set the value of all bins directly using the SetTotals method. This method takes a vector of real numbers as its only argument. The length of this vector must be equal to the number of bins. It sets the total of each bin to the corresponding value in the array.

The AddTotals method is similar, but adds the totals specified by the argument to the bin totals.

var totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0); // histogram1.SetTotals(totals); totals.CopyTo(histogram1); // histogram2.AddTotals(totals); histogram2.AddInPlace(totals);

To set all totals to zero, use the Clear method.

Individual bins are represented by Interval

You can use for-each to iterate through a histogram's bins:

foreach (var pair in histogram1.BinsAndValues) Console.WriteLine("{0}-{1}: total = {2}", pair.Key.LowerBound, pair.Key.UpperBound, pair.Value);

You can find the bin corresponding to a specific value through the
FindBin

The TotalValue property returns the sum of all totals in all bins. The GetTotals method returns a Double array containing the totals for each bin.

The GoodnessOfFitTest method returns a ChiSquareGoodnessOfFitTest object that can be used to verify the hypothesis that the data in the histogram follows a certain distribution. The method takes two arguments. The first is a ContinuousDistribution object that specifies the distribution to be tested against. The second is an integer that specifies the number of parameters of the distribution that were estimated. Any estimated argument reduces the degrees of freedom by one.

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.