Numerical Variables

Variables whose observations are numeric in nature are called numerical variables. In Extreme Numerics.NET, numerical variables are implemented by the Vector<T> class.

Descriptive Statistics

Numerical variables have the widest range of descriptive statistics available. The values are calculated as needed and, if the vector is read-only, they may be cached. Descriptive statistics are implemented as extension methods defined in the Stats class. The following tables list the descriptive statistics that are available for numerical variables:

Measures of Location

The purpose of a measure of location is to provide a typical or central value to describe the data.

Methods for measures of location

Property	Description
Mean	Returns the mean or average of all observations.
Median	Returns the median. The median is the middle value of a sorted list of observations. If a variable has an even number of observations, then the median is the average of the two middle values.
GeometricMean	Returns the geometric mean of all observations.
HarmonicMean	Returns the harmonic mean of all observations.
MidMean	Returns the mean of the middle 50% of observations.
TrimmedMean	Returns the mean of the observations after eliminating the specified percentage of extreme values.
WinsorizedMean	Returns the mean of the observations after setting the specified percentage of extreme values to the lowest or highest value.

The mid-mean is a special case of the trimmed mean. Providing a value of 100 for the percentage to the TrimmedMean method returns the median. The Winsorized mean is similar to the trimmed mean, but instead of eliminating the extreme values, they are set to the lowest or highest value. For example, for the 10% Winsorized mean, the 5% smallest values are set to equal the value at the 5% percentile, while the 5% largest values are set to equal the value at the 95% percentile.

The example below shows how to use some of the most common measures of location:

Console.WriteLine("Mean:           {0:F1}", variable1.Mean);
Console.WriteLine("Median:         {0:F1}", variable1.Median);
Console.WriteLine("Trimmed Mean:   {0:F1}", variable1.GetTrimmedMean(10));
Console.WriteLine("Harmonic Mean:  {0:F1}", variable1.GetHarmonicMean());
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean());

Visual Basic

Console.WriteLine("Mean:           {0:F1}", variable1.Mean)
Console.WriteLine("Median:         {0:F1}", variable1.Median)
Console.WriteLine("Trimmed Mean:   {0:F1}", variable1.GetTrimmedMean(10))
Console.WriteLine("Harmonic Mean:  {0:F1}", variable1.GetHarmonicMean())
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean())

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

No code example is currently available or this language may not be supported.

Measures of Scale

Measures of scale are used to characterize the spread or variability of a data set.

Properties for measures of scale

Property	Description
Variance	Returns the unbiased variance of the data.
PopulationVariance	Returns the variance of the data.
StandardDeviation	Returns the unbiased standard deviation of the data.
PopulationStandardDeviation	Returns the standard deviation of the data.
RootMeanSquare(Vector<Double>)	Returns the root-mean-square.
Range	Returns the difference between the largest and the smallest value.
Min<T>(IList<T>)	Returns the smallest value.
Max<T>(IList<T>)	Returns the largest value.
AverageAbsoluteDeviation	Returns the average absolute deviation from the mean.
MedianAbsoluteDeviation	Returns the median of the absolute deviation from the mean.
InterQuartileRange	Returns the difference between the first and the third quartile.

The variance and the standard deviation are always the unbiased versions. To get the biased (population) standard deviation, use the RootMeanSquare property.

The average absolute deviation is the mean of the absolute difference between each value and the mean. Because it does not square the distance from the mean, it is less affected by extreme values. The median absolute deviation is the median of the absolute difference between each value and the mean. It is even less influenced by extreme values because the median is less affected by extreme values than the mean.

The inter-quartile range is the difference between the 75% and the 25% percentile values. It is a measure of the variability of values close to the mean.

The example below shows some of these properties and methods:

Console.WriteLine("Standard deviation:  {0:F1}", variable1.StandardDeviation);
Console.WriteLine("Variance:            {0:F1}", variable1.Variance);
Console.WriteLine("Range:               {0:F1}", variable1.Range);
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange());

Visual Basic

Console.WriteLine("Standard deviation:  {0:F1}", variable1.StandardDeviation)
Console.WriteLine("Variance:            {0:F1}", variable1.Variance)
Console.WriteLine("Range:               {0:F1}", variable1.Range)
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange())

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

No code example is currently available or this language may not be supported.

Other methods

The remaining properties cover the higher moments (skewness and kurtosis) and the raw sums:

Other methods.

Method	Description
Skewness	Returns the unbiased skewness of the data.
PopulationSkewness	Returns the skewness of the data.
Kurtosis	Returns the unbiased kurtosis supplement of the data.
PopulationKurtosis	Returns the kurtosis supplement of the data.
Sum<T>(Vector<T>)	Returns the sum of all the elements.
SumOfSquares	Returns the sum of the squares.

The skewness is a measure for the lack of symmetry of the distribution of a variable. The kurtosis is a measure of the peakedness compared to the normal distribution. The Kurtosis method returns the kurtosis supplement, which is the difference between the 'real' kurtosis and the kurtosis of the normal distribution, which equals 3.

Correlations

Several measures of correlation are available. The Covariance method returns the covariance between two variables. The Correlation method returns the Pearson correlation between two variables. The RankCorrelation method returns the Spearman rank correlation between two variables. Finally, the KendallTau<T> method returns the Kendall rank correlation. These methods are defined in the Stats class.

The Autocorrelation method returns the Pearson correlation of a variable with itself. An optional integer argument specifies the lag.

Handling missing values

By default, missing values have the value Double.NaN. You can change this by setting the variable's MissingValue property.

Missing values are ignored during the calculation of descriptive statistics. To force a different behavior, you can transform the vector first. The RemoveMissingValues method returns a new vector with missing values omitted. The ReplaceMissingValues method returns a vector of the same length with missing values replaced. It has several overloads. The first overload takes a single value, which is used to replace the missing values. The second overload takes a Direction value and replaces missing values with the previous (Forward) or next (Backward) non-missing value. A third overload takes a vector, and replaces missing values with the corresponding value in the vector.

The IsMissing method indicates whether an observation is missing. Its only parameter is the index of the observation.

Operations on Numerical Variables

The numerical Vector<T> class has a number of additional methods that may be useful. The Normalize method returns the variable rescaled to have a mean of zero and a standard deviation equal to one.

The following example prepares data for a fit of the function

y = ae^{bx₁+cx₂x₃}.

The dependent variable is transformed using the Math.Log method. A new variable, z, is created to hold the product of x₂ and x₃. This transforms the above function into a form suitable for multiple linear regression:

log y = log a + bx₁ + cz.

NumericalVariable Y, X1, X2, X3;
// ...
NumericalVariable logY = Y.Apply(new RealFunction(Math.Log));
NumericalVariable Z = X2 * X3;

Visual Basic

Dim Y, X1, X2, X3 As NumericalVariable
' ...
Dim logY As NumericalVariable = Y.Apply(AddressOf Math.Log)
Dim Z As NumericalVariable Z = NumericalVariable.Multiply(X2, X3)

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

No code example is currently available or this language may not be supported.

Note that, because Visual Basic .NET (2003) does not support operator overloading, the shared Multiply method must be used.

The Sort() method sorts the observations. The order is ascending by default, but can be specified by passing a parameter of type SortOrder.