A histogram is a table used to tally the frequency of data.
Each data value is mapped to a bin.
The histogram itself is just a vector of real numbers labeled
by the categories, the bin index.
The HistogramT
class represents a histogram where the generic type argument defines
the type of the bins.
For categorical data, there is one bin for every category.
The type of the bins is the same as the data.
For continuous variables (real or date/time), the bins are defined
by intervals. The bins are of type IntervalT,
and the bin index is of type IntervalIndexT.
There are two basic ways to create a histogram:
you can create an empty histogram ready to receive
data to tally, or you can create a histogram from a
data source that has the data tallied.
Constructing empty histograms
Empty histograms are created using one of the overloads of the
HistogramCreateEmpty
method. This method has three overloads. The first two can be used
for continuous data. The third overload can be used for both continuous
and categorical data.
The first overload takes one or two arguments. The first is a
list of boundaries for the bins. The optional second argument
is a SpecialBins
value that specifies whether to create bins for values smaller than
the lowest bound or larger than the highest bound.
The following table lists the possible values:
Values of the SpecialBins enumeration
Name | Description |
---|
None | No special bins are included. |
BelowMinimum | There is a special bin for values below the scale's minimum value. |
AboveMaximum | There is a special bin for values above the scale's maximum value. |
If the BelowMinimum bin is included,
this bin is the first bin in the collection. If the
AboveMaximum bin is included,
it is the last bin in the collection. The following
creates two empty histograms. The second has a bin for values
smaller than 50:
double[] bounds = new double[] { 50, 62, 74, 88, 100 };
var histogram1 = Histogram.CreateEmpty(bounds);
var histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum);
Dim bounds = New Double() {50, 62, 74, 88, 100}
Dim histogram1 = Histogram.CreateEmpty(bounds)
Dim histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum)
No code example is currently available or this language may not be supported.
let bounds = [| 50.; 62.; 74.; 88.; 100. |]
let histogram1 = Histogram.CreateEmpty(bounds)
let histogram2 = Histogram.CreateEmpty(bounds, SpecialBins.BelowMinimum)
The second overload takes three arguments. The first two are
the lower bound of the lowest bin, and the upper bound of the highest bin.
The third argument is the total number of bins.
This creates an empty histogram with the specified
number of bins that are all equal in width. An optional fourth
argument is once again a
SpecialBins
value that indicates which special values should be tabulated in
addition to those within the specified interval.
The code below creates a histogram with five bins for values
between 50 and 100:
var histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5);
var bins = new IntervalIndex<double>(50.0, 100.0, 5);
var histogram4 = Histogram.CreateEmpty(bins);
Dim histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5)
Dim bins = New IntervalIndex(Of Double)(50.0, 100.0, 5)
Dim histogram4 = Histogram.CreateEmpty(bins)
No code example is currently available or this language may not be supported.
let histogram3 = Histogram.CreateEmpty(50.0, 100.0, 5)
let bins = new IntervalIndex<double>(50.0, 100.0, 5)
let histogram4 = Histogram.CreateEmpty(bins)
The third overload takes 1 argument: the
IndexT
that contains the labels for the bins. This constructor
is suitable for categorical data as well as continuous data
when the supplied index is a previously created
IntervalIndexT
var index = Index.Create(new[] { "High", "Medium", "Low" });
var histogram5 = Histogram.CreateEmpty(index);
Dim scale = Index.Create({"High", "Medium", "Low"})
Dim histogram5 = Histogram.CreateEmpty(scale)
No code example is currently available or this language may not be supported.
let index = Index.Create([| "High"; "Medium"; "Low" |])
let histogram5 = Histogram.CreateEmpty(index)
Constructing histograms from data
The Histogram
class defines an extension method,
CreateHistogram
that transforms the data in a vector into a histogram.
This method has many overloads which mirror to some degree the overloads of the
CreateEmptyT(IndexT)
method.
For categorical data, the overload takes as its only argument
a categorical vector. The method returns a count of each value
in the vector's category index:
var data = Vector.CreateCategorical(
new[] { "High", "Low", "High", "High", "Medium", "Low" });
var histogram4 = data.CreateHistogram();
Dim Data = Vector.CreateCategorical(
{"High", "Low", "High", "High", "Medium", "Low"})
Dim histogram4 = Data.CreateHistogram()
No code example is currently available or this language may not be supported.
let data = Vector.CreateCategorical(
[| "High"; "Low"; "High"; "High"; "Medium"; "Low" |])
let histogram4 = data.CreateHistogram()
Three more overloads work on continuous data. Each of these take
a list of values as their first argument. This may be a vector, an
array, or any other type that implements
IListT.
The first overload takes three or four additional arguments: the lower bound,
the upper bound, and the number of bins. Optionally, a
SpecialBins
value may be supplied that determines which special bins to include
in the histogram.
var values = new double[]
{62.0, 77.0, 61.0, 94.0, 75.0, 82.0, 86.0, 83.0, 64.0, 84.0,
68.0, 82.0, 72.0, 71.0, 85.0, 66.0, 61.0, 79.0, 81.0, 73.0};
var histogram1 = values.CreateHistogram(50.0, 100.0, 5);
Dim values = {
62.0, 77.0, 61.0, 94.0, 75.0,
82.0, 86.0, 83.0, 64.0, 84.0,
68.0, 82.0, 72.0, 71.0, 85.0,
66.0, 61.0, 79.0, 81.0, 73.0}
Dim histogram1 = values.CreateHistogram(50.0, 100.0, 5)
No code example is currently available or this language may not be supported.
let values =
[|
62.0; 77.0; 61.0; 94.0; 75.0;
82.0; 86.0; 83.0; 64.0; 84.0;
68.0; 82.0; 72.0; 71.0; 85.0;
66.0; 61.0; 79.0; 81.0; 73.0
|]
let histogram1 = values.CreateHistogram(50.0, 100.0, 5)
Another overload takes one additional argument: an
IntervalIndexT
that specifies the bin index for the histogram.
var bins = Index.CreateBins(50.0, 100.0, 5);
var histogram2 = values.CreateHistogram(bins);
Dim bins = Index.CreateBins(50.0, 100.0, 5)
Dim histogram2 = values.CreateHistogram(bins)
No code example is currently available or this language may not be supported.
let bins = Index.CreateBins(50.0, 100.0, 5)
let histogram2 = values.CreateHistogram(bins)
The last overload is like the previous one, but takes an additional
vector argument that specifies weights for the values. The bin
for each value will be incremented by the corresponding weight
instead of the value 1:
var weights = Vector.CreateRandom(20);
var histogram3 = values.CreateHistogram(bins, weights);
Dim weights = Vector.CreateRandom(20)
Dim histogram3 = values.CreateHistogram(bins, weights)
No code example is currently available or this language may not be supported.
let weights = Vector.CreateRandom(20)
let histogram3 = values.CreateHistogram(bins, weights)
There are three ways to set the totals for the bins in a histogram.
The first way is to use the
Increment method.
This method takes one or two arguments. The first argument is the number to tabulate. The second argument is an
optional weight. If no weight is specified, it is assumed to be 1. This method increments the total of the bin that
contains the first argument by 1 or the weight from the second argument.
histogram1.Increment(83.0);
histogram1.Increment(78.0, 2.5);
histogram5.Increment("High");
histogram5.Increment("Medium", 4.4);
histogram1.Increment(83.0)
histogram1.Increment(78.0, 2.5)
histogram5.Increment("High")
histogram5.Increment("Medium", 4.4)
No code example is currently available or this language may not be supported.
histogram1.Increment(83.0)
histogram1.Increment(78.0, 2.5)
histogram5.Increment("High")
histogram5.Increment("Medium", 4.4)
The second way is to use the
Tabulate
method. This method tabulates the data specified in its first argument. This can be any list of values.
An optional second argument specifies the weight for each data value. This argument is of the same type as the first
argument.
var data = new double[]
{62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
68, 82, 72, 71, 85, 66, 61, 79, 81, 73};
histogram2.Tabulate(data);
Dim data = New Double() {
62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
histogram2.Tabulate(data)
No code example is currently available or this language may not be supported.
let data = [| 62.; 77.; 61.; 94.; 75.; 82.; 86.; 83.; 64.; 84.;
68.; 82.; 72.; 71.; 85.; 66.; 61.; 79.; 81.; 73. |]
histogram2.Tabulate(data)
Finally, you can set the value of all bins directly using the SetTotals
method. This method takes a vector of real numbers as its only argument.
The length of this vector must be equal to the number of bins.
It sets the total of each bin to the corresponding value in the array.
The AddTotals method is similar, but adds the totals
specified by the argument to the bin totals.
var totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0);
totals.CopyTo(histogram1);
histogram2.AddInPlace(totals);
Dim totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0)
totals.CopyTo(histogram1)
histogram2.AddInPlace(totals)
No code example is currently available or this language may not be supported.
let totals = Vector.Create(2.0, 7.0, 9.0, 8.0, 1.0)
totals.CopyTo(histogram1) |> ignore
histogram2.AddInPlace(totals) |> ignore
To set all totals to zero, use the Clear method.
Individual bins are represented by IntervalT objects,
which have a LowerBound
and an UpperBound property.
Together, these define the interval that is covered by the bin.
The Width
property returns the total width of the bin. Note that this may be infinite.
All these properties are read-only.
You can use for-each to iterate through a histogram's bins:
foreach (var pair in histogram1.BinsAndValues)
Console.WriteLine("{0}-{1}: total = {2}",
pair.Key.LowerBound, pair.Key.UpperBound, pair.Value);
For Each pair In histogram1.BinsAndValues
Console.WriteLine("0}-1}: total = 2}",
pair.Key.LowerBound, pair.Key.UpperBound, pair.Value)
Next
No code example is currently available or this language may not be supported.
for pair in histogram1.BinsAndValues do
printfn "%f-%f: total = %f"
pair.Key.LowerBound pair.Key.UpperBound pair.Value
You can find the bin corresponding to a specific value through the
FindBinT
method. This returns the IntervalT
corresponding to its argument.
Other Properties and Methods
The TotalValue property returns the sum of all totals
in all bins. The GetTotals method returns a Double array containing the totals for each bin.
The GoodnessOfFitTest method returns a
ChiSquareGoodnessOfFitTest object that can be
used to verify the hypothesis that the data in the histogram follows a certain distribution. The method takes two
arguments. The first is a ContinuousDistribution object that specifies the
distribution to be tested against. The second is an integer that specifies the number of parameters of the
distribution that were estimated. Any estimated argument reduces the degrees of freedom by one.