Extreme Optimization > User's Guide > Statistics Library > Continuous Variables > Numerical Variables

Extreme Optimization User's Guide

User's Guide

Up: Continuous Variables Next: Transforming Numerical Variables Previous: Continuous Variables Contents

Numerical Variables

Variables whose observations are numeric in nature are called numerical variables. In the Extreme Optimization Numerical Libraries for .NET, numerical variables are implemented by the NumericalVariable class.

Constructing numerical variables

Numerical variables can be constructed in a variety of ways. The NumericalVariable class has six constructors that come in three groups.

The first group uses a Double array as the source of the data. The first variant has two parameters. The first is a string that specifies the name of the variable. The second parameter is a double array. The second variant only takes one parameter: a double array containing the data values.

C# CopyCode imageCopy Code
double[] dataArray = new double[]
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84, 
        68, 82, 72, 71, 85, 66, 61, 79, 81, 73};        
NumericalVariable variable1 = new NumericalVariable(dataArray);
NumericalVariable variable2 = new NumericalVariable("Data", dataArray);
Visual Basic CopyCode imageCopy Code
Dim dataArray As Double() = New Double() _
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84, _
        68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
Dim variable1 As NumericalVariable = New NumericalVariable("Data", dataArray)
Dim variable2 As NumericalVariable = New NumericalVariable(dataArray)

The second group uses a Vector as the source of the data. The first variant once again has two parameters. The first is a string that specifies the name of the variable. The second parameter is a Vector. The second variant only takes one parameter: a Vector containing the data values.

C# CopyCode imageCopy Code
Vector dataVector = new GeneralVector(dataArray);
NumericalVariable variable3 = new NumericalVariable(dataVector);
NumericalVariable variable4 = new NumericalVariable("Data", dataVector);
Visual Basic CopyCode imageCopy Code
Dim dataVector As Vector = New GeneralVector("Data", dataArray)
Dim variable3 As NumericalVariable = New NumericalVariable("Data", dataVector)
Dim variable4 As NumericalVariable = New NumericalVariable(dataVector)

The third group of constructors uses a DataColumn as the source of the data. The first variant once again has two parameters. The first is a string that specifies the name of the variable. The second parameter is a DataColumn. This values in the data column must be of a type that can be converted to a Double. If not, an InvalidCastException is thrown. The second variant only takes one parameter: a DataColumn containing the data values. The name of the variable is set to the Caption property of the data column.

C# CopyCode imageCopy Code
DataColumn column;
// Connect to a data source and retrieve the column from a DataTable
NumericalVariable variable5 = new NumericalVariable(column);
NumericalVariable variable6 = new NumericalVariable("Data", column);
Visual Basic CopyCode imageCopy Code
Dim column As DataColumn
' Connect to a data source and retrieve the column from a DataTable
Dim variable5 As NumericalVariable = New NumericalVariable("Data", column)
Dim variable6 As NumericalVariable = New NumericalVariable(column)

A number of static (Shared in Visual Basic) methods create numerical variables that span a range of numbers. The CreateRange method is overloaded. The first overload takes one integer argument that specifies a maximum value. It returns a NumericalVariable whose observations are the integers from 0 up to but not including the maximum value. The second overload takes two integer arguments that specify the minimum and maximum values. Once again, the maximum value is not included.

The third overload takes three arguments. The first is the number of observations. The second and third arguments are real numbers that specify the lowest and highest value. This method returns a NumericalVariable containing the specified number of observations that are equally spaced between the lowest and highest value. In this case, the highest value is included.

The CreateLogarithmicRange method also takes three arguments. The first is the number of observations. The second and third arguments are real numbers that specify the lowest and highest value. This method returns a NumericalVariable containing the specified number of observations whose logarithms are equally spaced between the lowest and highest value. This means that the ratio between two successive observations is a constant. The highest value is included.

The following example creates two variables with 5 observations. The first contains the first five multiples of 1000. The second has a logarithmic scale and contains the first 5 powers of 10, starting with 0:

C# CopyCode imageCopy Code
NumericalVariable variable7 = NumericalVariable.CreateRange(5, 1000, 5000);
NumericalVariable variable8 = NumericalVariable.CreateLogarithmicRange(5, 1, 10000);
Visual Basic CopyCode imageCopy Code
Dim variable7 As NumericalVariable = NumericalVariable.CreateRange(5, 1000, 5000)
Dim variable8 As NumericalVariable = _
    NumericalVariable.CreateLogarithmicRange(5, 1, 10000)

In addition, variables can be created by VariableCollection objects, by performing arithmetic operations on them (see below), and several other means.

Descriptive Statistics

Numerical variables have the widest range of descriptive statistics available. The values are calculated as needed, and cached. For some values, the calculation may be lengthy. In this case, the value is returned by a method instead of a property. The following tables list the descriptive statistics that are available for numerical variables:

Measures of Location

The purpose of a measure of location is to provide a typical or central value to describe the data.

 Property  Description
Mean  Returns the mean or average of all observations.
Median  Returns the median.
Table 1. Properties for measures of location.

The median is the middle value of a sorted list of observations. If a variable has an even number of observations, then the median is the average of the two middle values.

Method Description
GetGeometricMean  Returns the geometric mean of all obserrvations.
GetHarmonicMean  Returns the harmonic mean of all observations.
GetMidMean Returns the mean of the middle 50% of observations.
GetTrimmedMean  Returns the mean of the observations after eliminating the specified percentage of extreme values.
GetWinsorizedMean  Returns the mean of the observations after setting the specified percentage of extreme values to the lowest or highest value.
Table 2. Methods for measures of location.

The mid-mean is a special case of the trimmed mean. Providing a value of 100 for the percentage to the GetTrimmedMean method returns the median. The Winsorized mean is similar to the trimmed mean, but instead of eliminating the extreme values, they are set to the lowest or highest value. For example, for the 10% Winsorized mean, the 5% smallest values are set to equal the value at the 5% percentile, while the 5% largest values are set to equal the value at the 95% percentile.

The examle below shows how to use some of the most common measures of location:

C# CopyCode imageCopy Code
Console.WriteLine("Mean:           {0:F1}", variable1.Mean);
Console.WriteLine("Median:         {0:F1}", variable1.Median);
Console.WriteLine("Trimmed Mean:   {0:F1}", variable1.GetTrimmedMean(10));
Console.WriteLine("Harmonic Mean:  {0:F1}", variable1.GetHarmonicMean());
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean());
Visual Basic CopyCode imageCopy Code
Console.WriteLine("Mean:           {0:F1}", variable1.Mean)
Console.WriteLine("Median:         {0:F1}", variable1.Median)
Console.WriteLine("Trimmed Mean:   {0:F1}", variable1.GetTrimmedMean(10))
Console.WriteLine("Harmonic Mean:  {0:F1}", variable1.GetHarmonicMean())
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean())

Measures of Scale

Measures of scale are used to characterize the spread or variability of a data set.

 Property  Description
Variance  Returns the unbiased variance of the data.
PopulationVariance  Returns the variance of the data.
StandardDeviation  Returns the unbiased standard deviation of the data.
PopulationStandardDeviation  Returns the standard deviation of the data.
RootMeanSquare  Returns the root-mean-square.
Range  Returns the difference between the largest and the smallest value.
Minimum  Returns the smallest value.
Maximum  Returns the largest value.
Table 3. Properties for measures of scale.

The variance and the standard deviation are always the unbiased versions. To get the biased (population) standard deviation, use the RootMeanSquare property.

 Method  Description
GetAverageAbsoluteDeviation  Returns the average absolute deviation from the mean.
GetMedianAbsoluteDeviation Returns the median of the absolute deviation from the mean.
GetInterQuartileRange Returns the difference between the first and the third quartile.
Table 4. Methods for measures of scale.

The average absolute deviation is the mean of the absolute difference between each value and the mean. Because it does not square the distance from the mean, it is less affected by extreme values. The median absolute deviation is the median of the absolute difference between each value and the mean. It is even less influenced by extreme values because the median is less affected by extreme values than the mean.

The inter-quartile range is the difference between the 75% and the 25% percentile values. It is a measure of the variability of values close to the mean.

The examle below shows some of these properties and methods:

C# CopyCode imageCopy Code
Console.WriteLine("Standard deviation:  {0:F1}", variable1.StandardDeviation);
Console.WriteLine("Variance:            {0:F1}", variable1.Variance);
Console.WriteLine("Range:               {0:F1}", variable1.Range);
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange());
Visual Basic CopyCode imageCopy Code
Console.WriteLine("Standard deviation:  {0:F1}", variable1.StandardDeviation)
Console.WriteLine("Variance:            {0:F1}", variable1.Variance)
Console.WriteLine("Range:               {0:F1}", variable1.Range)
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange())

Other properties and methods

The remaining properties cover the higher moments (skewness and curtosis) and the raw sums:

 Property  Description
Skewness  Returns the unbiased skewness of the data.
PopulationSkewness  Returns the skewness of the data.
Kurtosis  Returns the unbiased kurtosis supplement of the data.
PopulationKurtosis  Returns the kurtosis supplement of the data.
Sum  Returns the sum of all the elements.
SumOfSquares  Returns the sum of the squares.
Table 5. Other properties.

The skewness is a measure for the lack of symmetry of the distribution of a variable. The kurtosis is a measure of the peakedness compared to the normal distribution. The Kurtosis property returns the kurtosis supplement, which is the difference between the 'real' kurtosis and the kurtosis of the normal distribution, which equals 3.

Correlations

Several measures of correlation are available. The GetCovariance method returns the covariance between two variables. The GetCorrelation method returns the Pearson correlation between two variables. The GetRankCorrelation method returns the Spearman rank correlation between two variables.

Each of these methods has two variants. The first is an instance method that takes the second variable as its only argument. It compares the current instance with its argument. The second variant is a static (Shared in Visual Basic) method that takes two variables as arguments and computes the value for its two arguments.

The GetAutoCorrelation method returns the Pearson correlation of a variable with itself. An optional integer argument specifies the lag.

Handling missing values

By default, missing values have the value Double.NaN. You can change this by setting the variable's MissingValue property.

Missing values are ignored during the calculation of descriptive statistics. To force a different behavior, you must call the ReplaceMissingValues method. This method takes one or two parameters. The first is a MissingValueAction value that determines the action that is to be taken when a missing value is encountered. The options are summarized in the table below. The second, optional parameter is a replacement value, if one is required. It defaults to zero.

 Member Name  Description
Default Missing values are ignored.
Discard All missing observations are discarded.
Ignore Missing values are ignored.
ReplaceWithPrevious Missing values are replaced with the value of the previous observation. If the first observation is missing, it is replaced with a user-specified value, or 0.
ReplaceWithNext Missing values are replaced with the value of the next observation. If the last observation is missing, it is replaced with a user-specified value, or 0.
ReplaceWithValue Missing values are replaced with a user-specified value, or 0.
Fail A MissingValueException is thrown.
Table 6. MissingValueAction values.

The IsMissing method indicates whether an observation is missing. Its only parameter is the index of the observation.

Operations on Numerical Variables

Numerical variables can be combined using arithmetic operations to form new variables. The result of such an operation is a variable whose components equal the operation applied to the corresponding components of the operand(s).

For languages that support operator overloading, the arithmetic operators, +, -, *, / have been overloaded. For languages that don't support operator overloading, static (Shared in Visual Basic) methods are provided.

The operations are summarized in the following table:

Operator  Static method Description
+x (no equivalent) Returns the variable x.
-x Negate(x) Returns the negation of the variable x.
x + y Add(x, y) Adds the variables x and y.
x + a Add(x, a) Adds the variable x and the real number a.
a + x Add(a, x) Adds the real number a to the variable x.
x - y Subtract(x, y) Subtracts the variables x and y.
x - a Subtract(x, a) Subtracts the real number a from the variable x.
a - x Subtract(a, x) Subtracts the variable x from the real number a.
x * y Multiply(x, y) Multiplies the variables x and y.
x * a Multiply(x, a) Multiplies the variable x and the real number a.
a * x Multiply(a, x) Multiplies the real number a and the variable x.
x / y Divide(x, y) Divides the variable x by y.
x / a Divide(x, a) Divides the variable x by the real number a.
a / x Divide(a, x) Divides the real number a by the variable x.
- Power(x, a) Raises the variable's observations to the power a.
- Max(x, y) Selects the largest observation from the variables x and y.
- Min(x, y) Selects the smallest observation from the variables x and y.
Table 7. Numerical Variable operators and their static (Shared) method equivalents.

The NumericalVariable class has a number of additional methods that may be useful. The Normalize method returns the variable rescaled to have a mean of zero and a standard deviation equal to one. The Apply method lets you apply any real function with one argument to each value of the variable. The result is a new variable. You can use this method to transform variables in preparation for a regression analysis.

The following example prepares data for a fit of the function

y =aebx1+cx2x3.

The dependent variable is transformed using the Math.Log method. A new variable, z, is created to hold the product of x2 and x3. This transforms the above function into a form suitable for multiple linear regression:

log y = log a + bx1 + cz.

C# CopyCode imageCopy Code
NumericalVariable Y, X1, X2, X3;
// ...
NumericalVariable logY = Y.Apply(new RealFunction(Math.Log));
NumericalVariable Z = X2 * X3;
Visual Basic CopyCode imageCopy Code
Dim Y, X1, X2, X3 As NumericalVariable
' ...
Dim logY As NumericalVariable = Y.Apply(New RealFunction(AddressOf Math.Log))
Dim Z As NumericalVariable Z = NumericalVariable.Multiply(X2, X3)

Note that, because Visual Basic .NET (2003) does not support operator overloading, the shared Multiply method must be used.

The Sort method returns a new variable with the values in ascending order. The SortInPlace method sorts the values in place.

Up: Continuous Variables Next: Transforming Numerical Variables Previous: Continuous Variables Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army