Extreme Optimization >
User's Guide >
Statistics Library >
Continuous Variables >
Numerical Variables
Extreme Optimization User's Guide
User's Guide
Up: Continuous Variables Next: Transforming Numerical Variables Previous: Continuous Variables Contents
Numerical Variables
Variables whose observations are numeric in nature are called
numerical variables. In the Extreme Optimization Numerical
Libraries for .NET, numerical variables are implemented by the
NumericalVariable
class.
Constructing numerical variables
Numerical variables can be constructed in a variety of ways. The
NumericalVariable class has six constructors that come
in three groups.
The first group uses a Double array as the source
of the data. The first variant has two parameters. The first is a
string that specifies the name of the variable. The second
parameter is a double array. The second variant only
takes one parameter: a double array containing the
data values.
| C# | Copy Code |
double[] dataArray = new double[]
{62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
68, 82, 72, 71, 85, 66, 61, 79, 81, 73};
NumericalVariable variable1 = new NumericalVariable(dataArray);
NumericalVariable variable2 = new NumericalVariable("Data", dataArray); |
| Visual Basic | Copy Code |
Dim dataArray As Double() = New Double() _
{62, 77, 61, 94, 75, 82, 86, 83, 64, 84, _
68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
Dim variable1 As NumericalVariable = New NumericalVariable("Data", dataArray)
Dim variable2 As NumericalVariable = New NumericalVariable(dataArray) |
The second group uses a Vector as the source of the
data. The first variant once again has two parameters. The first is
a string that specifies the name of the variable. The second
parameter is a Vector. The second variant only takes
one parameter: a Vector containing the data
values.
| C# | Copy Code |
Vector dataVector = new GeneralVector(dataArray);
NumericalVariable variable3 = new NumericalVariable(dataVector);
NumericalVariable variable4 = new NumericalVariable("Data", dataVector); |
| Visual Basic | Copy Code |
Dim dataVector As Vector = New GeneralVector("Data", dataArray)
Dim variable3 As NumericalVariable = New NumericalVariable("Data", dataVector)
Dim variable4 As NumericalVariable = New NumericalVariable(dataVector) |
The third group of constructors uses a DataColumn
as the source of the data. The first variant once again has two
parameters. The first is a string that specifies the name of the
variable. The second parameter is a DataColumn. This
values in the data column must be of a type that can be converted
to a Double. If not, an
InvalidCastException is thrown. The second variant
only takes one parameter: a DataColumn containing the
data values. The name of the variable is set to the
Caption property of the data column.
| C# | Copy Code |
DataColumn column;
// Connect to a data source and retrieve the column from a DataTable
NumericalVariable variable5 = new NumericalVariable(column);
NumericalVariable variable6 = new NumericalVariable("Data", column); |
| Visual Basic | Copy Code |
Dim column As DataColumn
' Connect to a data source and retrieve the column from a DataTable
Dim variable5 As NumericalVariable = New NumericalVariable("Data", column)
Dim variable6 As NumericalVariable = New NumericalVariable(column) |
A number of static (Shared in Visual Basic) methods create
numerical variables that span a range of numbers. The
CreateRange method is overloaded. The first overload
takes one integer argument that specifies a maximum value. It
returns a NumericalVariable whose observations are the
integers from 0 up to but not including the maximum value. The
second overload takes two integer arguments that specify the
minimum and maximum values. Once again, the maximum value is not
included.
The third overload takes three arguments. The first is the
number of observations. The second and third arguments are real
numbers that specify the lowest and highest value. This method
returns a NumericalVariable containing the specified
number of observations that are equally spaced between the lowest
and highest value. In this case, the highest value is included.
The CreateLogarithmicRange
method also takes three arguments. The first is the number of
observations. The second and third arguments are real numbers that
specify the lowest and highest value. This method returns a
NumericalVariable containing the specified number of
observations whose logarithms are equally spaced between the lowest
and highest value. This means that the ratio between two successive
observations is a constant. The highest value is included.
The following example creates two variables with 5 observations.
The first contains the first five multiples of 1000. The second has
a logarithmic scale and contains the first 5 powers of 10, starting
with 0:
| C# | Copy Code |
NumericalVariable variable7 = NumericalVariable.CreateRange(5, 1000, 5000);
NumericalVariable variable8 = NumericalVariable.CreateLogarithmicRange(5, 1, 10000);
|
| Visual Basic | Copy Code |
Dim variable7 As NumericalVariable = NumericalVariable.CreateRange(5, 1000, 5000)
Dim variable8 As NumericalVariable = _
NumericalVariable.CreateLogarithmicRange(5, 1, 10000) |
In addition, variables can be created by
VariableCollection objects, by performing arithmetic
operations on them (see below), and several other means.
Descriptive Statistics
Numerical variables have the widest range of descriptive
statistics available. The values are calculated as needed, and
cached. For some values, the calculation may be lengthy. In this
case, the value is returned by a method instead of a property. The
following tables list the descriptive statistics that are available
for numerical variables:
Measures of Location
The purpose of a measure of location is to provide a typical or
central value to describe the data.
| Property |
Description |
| Mean |
Returns the mean or average of all observations. |
| Median |
Returns the median. |
Table 1. Properties for measures of location.
The median is the middle value of a sorted list of observations.
If a variable has an even number of observations, then the median
is the average of the two middle values.
| Method |
Description |
GetGeometricMean |
Returns the geometric mean of all
obserrvations. |
|
GetHarmonicMean |
Returns the harmonic mean of all observations. |
| GetMidMean |
Returns the mean of the middle 50% of observations. |
|
GetTrimmedMean |
Returns the mean of the observations after eliminating the
specified percentage of extreme values. |
|
GetWinsorizedMean |
Returns the mean of the observations after setting the
specified percentage of extreme values to the lowest or highest
value. |
Table 2. Methods for measures of location.
The mid-mean is a special case of the trimmed mean. Providing a
value of 100 for the percentage to the GetTrimmedMean
method returns the median. The Winsorized mean is similar to the
trimmed mean, but instead of eliminating the extreme values, they
are set to the lowest or highest value. For example, for the 10%
Winsorized mean, the 5% smallest values are set to equal the
value at the 5% percentile, while the 5% largest values are set to
equal the value at the 95% percentile.
The examle below shows how to use some of the most common
measures of location:
| C# | Copy Code |
Console.WriteLine("Mean: {0:F1}", variable1.Mean);
Console.WriteLine("Median: {0:F1}", variable1.Median);
Console.WriteLine("Trimmed Mean: {0:F1}", variable1.GetTrimmedMean(10));
Console.WriteLine("Harmonic Mean: {0:F1}", variable1.GetHarmonicMean());
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean()); |
| Visual Basic | Copy Code |
Console.WriteLine("Mean: {0:F1}", variable1.Mean)
Console.WriteLine("Median: {0:F1}", variable1.Median)
Console.WriteLine("Trimmed Mean: {0:F1}", variable1.GetTrimmedMean(10))
Console.WriteLine("Harmonic Mean: {0:F1}", variable1.GetHarmonicMean())
Console.WriteLine("Geometric Mean: {0:F1}", variable1.GetGeometricMean()) |
Measures of Scale
Measures of scale are used to characterize the spread or
variability of a data set.
Table 3. Properties for measures of scale.
The variance and the standard deviation are always the unbiased
versions. To get the biased (population) standard deviation, use
the RootMeanSquare property.
Table 4. Methods for measures of scale.
The average absolute deviation is the mean of the absolute
difference between each value and the mean. Because it does not
square the distance from the mean, it is less affected by extreme
values. The median absolute deviation is the median of the absolute
difference between each value and the mean. It is even less
influenced by extreme values because the median is less affected by
extreme values than the mean.
The inter-quartile range is the difference between the 75% and
the 25% percentile values. It is a measure of the variability of
values close to the mean.
The examle below shows some of these properties and methods:
| C# | Copy Code |
Console.WriteLine("Standard deviation: {0:F1}", variable1.StandardDeviation);
Console.WriteLine("Variance: {0:F1}", variable1.Variance);
Console.WriteLine("Range: {0:F1}", variable1.Range);
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange()); |
| Visual Basic | Copy Code |
Console.WriteLine("Standard deviation: {0:F1}", variable1.StandardDeviation)
Console.WriteLine("Variance: {0:F1}", variable1.Variance)
Console.WriteLine("Range: {0:F1}", variable1.Range)
Console.WriteLine("Inter-quartile range:{0:F1}", variable1.GetInterQuartileRange()) |
Other properties and methods
The remaining properties cover the higher moments (skewness and
curtosis) and the raw sums:
| Property |
Description |
| Skewness |
Returns the unbiased skewness of the data. |
| PopulationSkewness |
Returns the skewness of the data. |
| Kurtosis |
Returns the unbiased kurtosis supplement of the data. |
| PopulationKurtosis |
Returns the kurtosis supplement of the data. |
| Sum |
Returns the sum of all the elements. |
|
SumOfSquares |
Returns the sum of the squares. |
Table 5. Other properties.
The skewness is a measure for the lack of symmetry of the
distribution of a variable. The kurtosis is a measure of the
peakedness compared to the normal distribution. The
Kurtosis property returns the kurtosis
supplement, which is the difference between the 'real'
kurtosis and the kurtosis of the normal distribution, which equals
3.
Correlations
Several measures of correlation are available. The
GetCovariance
method returns the covariance between two variables. The
GetCorrelation
method returns the Pearson correlation between two variables. The
GetRankCorrelation
method returns the Spearman rank correlation between two
variables.
Each of these methods has two variants. The first is an instance
method that takes the second variable as its only argument. It
compares the current instance with its argument. The second variant
is a static (Shared in Visual Basic) method that takes two
variables as arguments and computes the value for its two
arguments.
The
GetAutoCorrelation method returns the Pearson
correlation of a variable with itself. An optional integer argument
specifies the lag.
Handling missing values
By default, missing values have the value
Double.NaN. You can change this by setting the
variable's MissingValue
property.
Missing values are ignored during the calculation of descriptive
statistics. To force a different behavior, you must call the
ReplaceMissingValues method. This method takes one or
two parameters. The first is a MissingValueAction
value that determines the action that is to be taken when a missing
value is encountered. The options are summarized in the table
below. The second, optional parameter is a replacement value, if
one is required. It defaults to zero.
| Member Name |
Description |
| Default |
Missing values are ignored. |
| Discard |
All missing observations are discarded. |
| Ignore |
Missing values are ignored. |
| ReplaceWithPrevious |
Missing values are replaced with the value of the previous
observation. If the first observation is missing, it is replaced
with a user-specified value, or 0. |
| ReplaceWithNext |
Missing values are replaced with the value of the next
observation. If the last observation is missing, it is replaced
with a user-specified value, or 0. |
| ReplaceWithValue |
Missing values are replaced with a user-specified value, or
0. |
| Fail |
A MissingValueException
is thrown. |
Table 6. MissingValueAction values.
The IsMissing
method indicates whether an observation is missing. Its only
parameter is the index of the observation.
Operations on Numerical Variables
Numerical variables can be combined using arithmetic operations
to form new variables. The result of such an operation is a
variable whose components equal the operation applied to the
corresponding components of the operand(s).
For languages that support operator overloading, the arithmetic
operators, +, -, *, / have been overloaded. For languages that
don't support operator overloading, static (Shared in Visual Basic)
methods are provided.
The operations are summarized in the following table:
| Operator |
Static method |
Description |
+x |
(no equivalent) |
Returns the variable x. |
-x |
Negate(x) |
Returns the negation of the variable x. |
x + y |
Add(x, y) |
Adds the variables x and y. |
x + a |
Add(x, a) |
Adds the variable x and the real number
a. |
a + x |
Add(a, x) |
Adds the real number a to the variable
x. |
x - y |
Subtract(x, y) |
Subtracts the variables x and y. |
x - a |
Subtract(x, a) |
Subtracts the real number a from the variable
x. |
a - x |
Subtract(a, x) |
Subtracts the variable x from the real number
a. |
x * y |
Multiply(x, y) |
Multiplies the variables x and
y. |
x * a |
Multiply(x, a) |
Multiplies the variable x and the real number
a. |
a * x |
Multiply(a, x) |
Multiplies the real number a and the variable
x. |
x / y |
Divide(x, y) |
Divides the variable x by y. |
x / a |
Divide(x, a) |
Divides the variable x by the real number
a. |
a / x |
Divide(a, x) |
Divides the real number a by the variable
x. |
- |
Power(x, a) |
Raises the variable's observations to the power
a. |
- |
Max(x, y) |
Selects the largest observation from the variables
x and y. |
- |
Min(x, y) |
Selects the smallest observation from the variables
x and y. |
Table 7. Numerical Variable operators and their static (Shared)
method equivalents.
The NumericalVariable class has a number of
additional methods that may be useful. The Normalize
method returns the variable rescaled to have a mean of zero and a
standard deviation equal to one. The Apply
method lets you apply any real function with one argument to each
value of the variable. The result is a new variable. You can use
this method to transform variables in preparation for a regression
analysis.
The following example prepares data for a fit of the
function
y =aebx1+cx2x3.
The dependent variable is transformed using the
Math.Log method. A new variable, z, is created
to hold the product of x2 and
x3. This transforms the above function into a
form suitable for multiple linear regression:
log y = log a +
bx1 + cz.
| C# | Copy Code |
NumericalVariable Y, X1, X2, X3;
// ...
NumericalVariable logY = Y.Apply(new RealFunction(Math.Log));
NumericalVariable Z = X2 * X3; |
| Visual Basic | Copy Code |
Dim Y, X1, X2, X3 As NumericalVariable
' ...
Dim logY As NumericalVariable = Y.Apply(New RealFunction(AddressOf Math.Log))
Dim Z As NumericalVariable Z = NumericalVariable.Multiply(X2, X3) |
Note that, because Visual Basic .NET (2003) does not support
operator overloading, the shared Multiply method must
be used.
The Sort
method returns a new variable with the values in ascending order.
The SortInPlace
method sorts the values in place.
Up: Continuous Variables Next: Transforming Numerical Variables Previous: Continuous Variables Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation