Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Nuget packages
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Data Analysis Library User's Guide
    • Indexes
    • Data Frames
    • Data wrangling
    • Grouping and Aggregation
    • Working with Categorical Data
    • Working with Time Series Data
  • Working with Categorical Data
    • Categorical Vectors
    • Binning and Discretization
    • Histograms
  • Histograms

Histograms

Extreme Optimization Numerical Libraries for .NET Professional

A histogram is a table used to tally the frequency of data. Each data value is mapped to a bin. The histogram itself is just a vector of real numbers labeled by the categories, the bin index. The HistogramT class represents a histogram where the generic type argument defines the type of the bins. For categorical data, there is one bin for every category. The type of the bins is the same as the data. For continuous variables (real or date/time), the bins are defined by intervals. The bins are of type IntervalT, and the bin index is of type IntervalIndexT.

Constructing histograms

There are two basic ways to create a histogram: you can create an empty histogram ready to receive data to tally, or you can create a histogram from a data source that has the data tallied.

Constructing empty histograms

Empty histograms are created using one of the overloads of the HistogramCreateEmpty method. This method has three overloads. The first two can be used for continuous data. The third overload can be used for both continuous and categorical data.

The first overload takes one or two arguments. The first is a list of boundaries for the bins. The optional second argument is a SpecialBins value that specifies whether to create bins for values smaller than the lowest bound or larger than the highest bound. The following table lists the possible values:

Values of the SpecialBins enumeration

Name

Description

None

No special bins are included.

BelowMinimum

There is a special bin for values below the scale's minimum value.

AboveMaximum

There is a special bin for values above the scale's maximum value.

If the BelowMinimum bin is included, this bin is the first bin in the collection. If the AboveMaximum bin is included, it is the last bin in the collection. The following creates two empty histograms. The second has a bin for values smaller than 50:

C#
VB
C++
F#
Copy
double[] bounds = new double[] { 50, 62, 74, 88, 100 };
var histogram1 = Histogram.CreateEmpty(bounds);
var histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum);
Dim bounds = New Double() {50, 62, 74, 88, 100}
Dim histogram1 = Histogram.CreateEmpty(bounds)
Dim histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum)

No code example is currently available or this language may not be supported.

let bounds = [| 50.; 62.; 74.; 88.; 100. |]
let histogram1 = Histogram.CreateEmpty(bounds)
let histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum)

The second overload takes three arguments. The first two are the lower bound of the lowest bin, and the upper bound of the highest bin. The third argument is the total number of bins. This creates an empty histogram with the specified number of bins that are all equal in width. An optional fourth argument is once again a SpecialBins value that indicates which special values should be tabulated in addition to those within the specified interval. The code below creates a histogram with five bins for values between 50 and 100:

C#
VB
C++
F#
Copy
var histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5);
var bins = new IntervalIndex<double>(50.0, 100.0, 5);
var histogram4 = Histogram.CreateEmpty(bins);
Dim histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5)
Dim bins = New IntervalIndex(Of Double)(50.0, 100.0, 5)
Dim histogram4 = Histogram.CreateEmpty(bins)

No code example is currently available or this language may not be supported.

let histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5)
let bins = new IntervalIndex<double>(50.0, 100.0, 5)
let histogram4 = Histogram.CreateEmpty(bins)

The third overload takes 1 argument: the IndexT that contains the labels for the bins. This constructor is suitable for categorical data as well as continuous data when the supplied index is a previously created IntervalIndexT

C#
VB
C++
F#
Copy
var index = Index.Create(new[] { "High", "Medium", "Low" });
var histogram5 = Histogram.CreateEmpty(index);
Dim scale = Index.Create({"High", "Medium", "Low"})
Dim histogram5 = Histogram.CreateEmpty(scale)

No code example is currently available or this language may not be supported.

let index = Index.Create([| "High"; "Medium"; "Low" |])
let histogram5 = Histogram.CreateEmpty(index)
Constructing histograms from data

The Histogram class defines an extension method, CreateHistogram that transforms the data in a vector into a histogram. This method has many overloads which mirror to some degree the overloads of the CreateEmptyT(IndexT) method.

For categorical data, the overload takes as its only argument a categorical vector. The method returns a count of each value in the vector's category index:

C#
VB
C++
F#
Copy
var data = Vector.CreateCategorical(
    new[] { "High", "Low", "High", "High", "Medium", "Low" });
var histogram4 = data.CreateHistogram();
Dim Data = Vector.CreateCategorical(
    {"High", "Low", "High", "High", "Medium", "Low"})
Dim histogram4 = Data.CreateHistogram()

No code example is currently available or this language may not be supported.

let data = Vector.CreateCategorical(
            [| "High"; "Low"; "High"; "High"; "Medium"; "Low" |])
let histogram4 = data.CreateHistogram()

Three more overloads work on continuous data. Each of these take a list of values as their first argument. This may be a vector, an array, or any other type that implements IListT. The first overload takes three or four additional arguments: the lower bound, the upper bound, and the number of bins. Optionally, a SpecialBins value may be supplied that determines which special bins to include in the histogram.

C#
VB
C++
F#
Copy
var values = new double[]
    {62.0, 77.0, 61.0, 94.0, 75.0, 82.0, 86.0, 83.0, 64.0, 84.0,
     68.0, 82.0, 72.0, 71.0, 85.0, 66.0, 61.0, 79.0, 81.0, 73.0};
var histogram1 = values.CreateHistogram(50.0, 100.0, 5);
Dim values = {
    62.0, 77.0, 61.0, 94.0, 75.0,
    82.0, 86.0, 83.0, 64.0, 84.0,
    68.0, 82.0, 72.0, 71.0, 85.0,
    66.0, 61.0, 79.0, 81.0, 73.0}
Dim histogram1 = values.CreateHistogram(50.0, 100.0, 5)

No code example is currently available or this language may not be supported.

let values = 
    [|
        62.0; 77.0; 61.0; 94.0; 75.0; 
        82.0; 86.0; 83.0; 64.0; 84.0;
        68.0; 82.0; 72.0; 71.0; 85.0;
         66.0; 61.0; 79.0; 81.0; 73.0
     |]
let histogram1 = values.CreateHistogram(50.0, 100.0, 5)

Another overload takes one additional argument: an IntervalIndexT that specifies the bin index for the histogram.

C#
VB
C++
F#
Copy
var bins = Index.CreateBins(50.0, 100.0, 5);
var histogram2 = values.CreateHistogram(bins);
Dim bins = Index.CreateBins(50.0, 100.0, 5)
Dim histogram2 = values.CreateHistogram(bins)

No code example is currently available or this language may not be supported.

let bins = Index.CreateBins(50.0, 100.0, 5)
let histogram2 = values.CreateHistogram(bins)

The last overload is like the previous one, but takes an additional vector argument that specifies weights for the values. The bin for each value will be incremented by the corresponding weight instead of the value 1:

C#
VB
C++
F#
Copy
var weights = Vector.CreateRandom(20);
var histogram3 = values.CreateHistogram(bins, weights);
Dim weights = Vector.CreateRandom(20)
Dim histogram3 = values.CreateHistogram(bins, weights)

No code example is currently available or this language may not be supported.

let weights = Vector.CreateRandom(20)
let histogram3 = values.CreateHistogram(bins, weights)
Tabulating Data

There are three ways to set the totals for the bins in a histogram.

The first way is to use the Increment method. This method takes one or two arguments. The first argument is the number to tabulate. The second argument is an optional weight. If no weight is specified, it is assumed to be 1. This method increments the total of the bin that contains the first argument by 1 or the weight from the second argument.

C#
VB
C++
F#
Copy
histogram1.Increment(83.0);
histogram1.Increment(78.0, 2.5);
histogram5.Increment("High");
histogram5.Increment("Medium", 4.4);
histogram1.Increment(83.0)
histogram1.Increment(78.0, 2.5)
histogram5.Increment("High")
histogram5.Increment("Medium", 4.4)

No code example is currently available or this language may not be supported.

histogram1.Increment(83.0)
histogram1.Increment(78.0, 2.5)
histogram5.Increment("High")
histogram5.Increment("Medium", 4.4)

The second way is to use the Tabulate method. This method tabulates the data specified in its first argument. This can be any list of values. An optional second argument specifies the weight for each data value. This argument is of the same type as the first argument.

C#
VB
C++
F#
Copy
var data = new double[]
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
     68, 82, 72, 71, 85, 66, 61, 79, 81, 73};
histogram2.Tabulate(data);
Dim data = New Double() {
    62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
     68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
histogram2.Tabulate(data)

No code example is currently available or this language may not be supported.

let data = [| 62.; 77.; 61.; 94.; 75.; 82.; 86.; 83.; 64.; 84.;
     68.; 82.; 72.; 71.; 85.; 66.; 61.; 79.; 81.; 73. |]
histogram2.Tabulate(data)

Finally, you can set the value of all bins directly using the SetTotals method. This method takes a vector of real numbers as its only argument. The length of this vector must be equal to the number of bins. It sets the total of each bin to the corresponding value in the array.

The AddTotals method is similar, but adds the totals specified by the argument to the bin totals.

C#
VB
C++
F#
Copy
var totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0);
// histogram1.SetTotals(totals);
totals.CopyTo(histogram1);
// histogram2.AddTotals(totals);
histogram2.AddInPlace(totals);
Dim totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0)
' histogram1.SetTotals(totals)
totals.CopyTo(histogram1)
' histogram2.AddTotals(totals)
histogram2.AddInPlace(totals)

No code example is currently available or this language may not be supported.

let totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0)
// histogram1.SetTotals(totals)
totals.CopyTo(histogram1) |> ignore
// histogram2.AddTotals(totals)
histogram2.AddInPlace(totals) |> ignore

To set all totals to zero, use the Clear method.

Histogram Bins

Individual bins are represented by IntervalT objects, which have a LowerBound and an UpperBound property. Together, these define the interval that is covered by the bin. The Width property returns the total width of the bin. Note that this may be infinite. All these properties are read-only.

You can use for-each to iterate through a histogram's bins:

C#
VB
C++
F#
Copy
foreach (var pair in histogram1.BinsAndValues)
    Console.WriteLine("{0}-{1}: total = {2}",
        pair.Key.LowerBound, pair.Key.UpperBound, pair.Value);
For Each pair In histogram1.BinsAndValues
    Console.WriteLine("0}-1}: total = 2}",
        pair.Key.LowerBound, pair.Key.UpperBound, pair.Value)
Next

No code example is currently available or this language may not be supported.

for pair in histogram1.BinsAndValues do
    printfn "%f-%f: total = %f"
        pair.Key.LowerBound pair.Key.UpperBound pair.Value

You can find the bin corresponding to a specific value through the FindBinT method. This returns the IntervalT corresponding to its argument.

Other Properties and Methods

The TotalValue property returns the sum of all totals in all bins. The GetTotals method returns a Double array containing the totals for each bin.

The GoodnessOfFitTest method returns a ChiSquareGoodnessOfFitTest object that can be used to verify the hypothesis that the data in the histogram follows a certain distribution. The method takes two arguments. The first is a ContinuousDistribution object that specifies the distribution to be tested against. The second is an integer that specifies the number of parameters of the distribution that were estimated. Any estimated argument reduces the degrees of freedom by one.

Copyright (c) 2004-2021 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2021, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.