Extreme Optimization > User's Guide > Statistics Library > Continuous Variables > Transforming Numerical Variables

Extreme Optimization User's Guide

User's Guide

Up: Continuous Variables Next: Time Variables Previous: Numerical Variables Contents

Transforming Numerical Variables

In many situations, it is useful to apply some kind of transformation to a numerical variable. To avoid cluttering the members of the NumericalVariable class with these methods, they are made available as methods of the numerical variable's Transforms property.

Transformations can be subdivided in the following categories:

Arithmetic operations have been discussed in the previous section. They are available as overloaded operators or static (Shared in Visual Basic) operator methods on the NumericalVariable class. The remaining transformations are available through the Transforms property. Each category will now be described in greater detail.

Elementary functions

This category includes transformations that involve applying an elementary function to each observation of a variable. The table below lists the methods in this category:

 Member Name  Description
Abs Each observation is the absolute value of the original observation.
Exp Each observation is the exponential of the original observation.
Log Each observation is the natural logarithm of the original observation.
Sqrt Each observation is the square root of the original observation.
Table 1. Methods for elementary functions.

Simple transformations

This category includes transformations that involve arithmetic operations and translations.

The GetLag method is overloaded. Without parameters, it returns a variable whose observations are moved ahead by one interval. Each new observation is the observation before the current observation. The first observation is set to NaN.

The second overload takes one parameter: the lag, or number of observations to shift the series by. A positive value indicates that the observations are shifted forward. If the lag is equal to 1, then each new observation is the observation before the current observation. If the lag is equal to -1, then each new observation is the observation after the current observation. Any observations that do not exist in the original variable are set to NaN.

The third overload takes two parameters. The first parameter is once again the lag. The second parameter specifies the value of the observations that do not exist in the original variable.

The GetCumulativeSum method returns a variable whose observations are the cumulative sum of all observations up to the current observation. The GetCumulativeProduct method returns a variable whose observations are the cumulative product of all observations up to the current observation.

The following example creates a variable that contains the observations of the previous period. It then creates a variable that contains the cumulative sum of the variable.

C# CopyCode imageCopy Code
NumericalVariable previous = current.Transforms.GetLag(1);
NumericalVariable cumsum = current.Transforms.GetCumulativeSum();
Visual Basic CopyCode imageCopy Code
Dim previous As NumericalVariable = current.Transforms.GetLag(1)
Dim cumsum As NumericalVariable = current.Transforms.GetCumulativeSum()

Indicators of change.

This set of transformations compares each current observation to a past observation. The distance between the current observation and its reference observation is called the lag. It is passed to each of the methods as their only parameter. Its value must be greater than zero.

The GetChange method returns a variable where each observation is the difference between the current observation and the reference observation.

The GetPercentChange method is similar. Each observation is the percentage change of the current observation relative to the reference observation. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The GetGrowthRate method returns a variable containing the exponential growth rate. Each observation is the percentage change of the current observation relative to the reference observation, assuming the growth compounds continuously over time. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The first lag-1 observations of the new variable are set to NaN.

The example below calculates the different indicators of change for a 10 period lag:

C# CopyCode imageCopy Code
NumericalVariable change = current.Transforms.GetChange(10);
NumericalVariable pctChange = current.Transforms.GetPercentChange(10);
NumericalVariable growthRate = current.Transforms.GetGrowthRate(10);
Visual Basic CopyCode imageCopy Code
Dim change As NumericalVariable = current.Transforms.GetChange(10)
Dim pctChange As NumericalVariable = current.Transforms.GetPercentChange(10)
Dim growthRate As NumericalVariable = current.Transforms.GetExponentialGrowthRate(10)

Extrapolated indicators of change.

This set of transformations is similar to the previous one. However, the observed change is extrapolated to a larger interval. Once again, the lag is passed as the first parameter. A second parameter, numberOfPeriods, indicates the relative size of the extrapolation interval.

For example, if the current variable represents the price of a certain commodity at the end of each month, then a value of 12 for numberOfPeriods produces a variable that represents the annualized change in price over each month.

The GetExtrapolatedChange method returns a variable where each observation is the extrapolated difference between the current observation and the reference observation.

The GetExtrapolatedPercentChange method is similar. Each observation is the extrapolated percentage change of the current observation relative to the reference observation. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The GetExtrapolatedGrowthRate method returns a variable containing the extrapolated exponential growth rate. Each observation is the extrapolated percentage change of the current observation relative to the reference observation, assuming the growth compounds continuously over time. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

Once again, the first lag-1 observations of the new variable are set to NaN.

The example below calculates the different indicators of change for a 10 period lag and extrapolates it to 360 period values:

C# CopyCode imageCopy Code
NumericalVariable change360 =    current.Transforms.GetExtrapolatedChange(10, 360);
NumericalVariable pctChange360 =    current.Transforms.GetExtrapolatedPercentChange(10, 360);
NumericalVariable growthRate360 =    current.Transforms.GetExtrapolatedExponentialGrowthRate(10, 360);
Visual Basic CopyCode imageCopy Code
Dim change360 As NumericalVariable = _
    current.Transforms.GetExtrapolatedChange(10, 360)
Dim pctChange360 As NumericalVariable = _
    current.Transforms.GetExtrapolatedPercentChange(10, 360)
Dim growthRate360 As NumericalVariable = _
    current.Transforms.GetExtrapolatedGrowthRate(10, 360)

Moving averages.

Moving averages are commonly used to smooth data, and to find trends in time series.

The GetMovingAverage method returns the simple moving average. It takes one parameter: the number of observations to average. Each new observation is the average of the n observations up to and including the current observation.

The GetExponentialMovingAverage method returns the exponential moving average. Each new observation is a weighted combination of the current observation and the previous average.

The exponential moving average can be specified in two ways. You can specify the period as an integer. Alternatively, you can specify the smoothing constant. This is a real number between 0 and 1 that specifies the contribution of the current observation to the current moving average.

The code below calculates three moving averages: a simple 20 day moving average, a 20 day exponential moving average, and a 3 day exponential moving average specified using the smoothingconstant:

C# CopyCode imageCopy Code
NumericalVariable MA20 = current.Transforms.GetMovingAverage(20);
NumericalVariable EMA20 = current.Transforms.GetExponentialMovingAverage(20);
NumericalVariable EMA3 = current.Transforms.GetExponentialMovingAverage(2.0 / (1 + 3));
Visual Basic CopyCode imageCopy Code
Dim MA20 As NumericalVariable = current.Transforms.GetMovingAverage(20)
Dim EMA20 As NumericalVariable = current.Transforms.GetExponentialMovingAverage(20)
Dim EMA3 As NumericalVariable = current.Transforms.GetExponentialMovingAverage(0.5)

The GetWeightedMovingAverage method returns a weighted moving average. Each new observation is the weighted sum of the observations.

The weights for the weighted moving average can be supplied as a Double array, or as a Vector. The weights are used in reverse order. The weight with index zero is the weight for the current observation. The weight with index one is the weight for the previous observation.

An optional integer parameter specifies the index in the weight vector that corresponds to the current observation. This allows you to create centrally weighted averages. The default is zero.

The following example creates a weighted moving average of five observations centered around the current observation:

C# CopyCode imageCopy Code
double[] weights = {1.0, 2.0, 3.0, 2.0, 1.0};
NumericalVariable WMA3 = current.Transforms.GetWeightedMovingAverage(weights, 2);
Visual Basic CopyCode imageCopy Code
Dim weights As Double() = {1.0, 2.0, 3.0, 2.0, 1.0}
Dim WMA5 As NumericalVariable = _
                current.Transforms.GetWeightedMovingAverage(weights, 2)

Other moving summary statistics.

The methods in this group calculate some statistic of a moving window of observations.

The GetMovingMaximum method returns a variable whose observations are the largest of the n observations up to and including the current observation. The GetMovingMinimum method returns a variable whose observations are the smallest of the n observations up to and including the current observation.

The GetMovingStandardDeviation method calculates a moving standard deviation. Each new observation is the standard deviation of the n observations up to and including the current observation. It takes two parameters. The first is an integer that specifies the number of observations. The second parameter is a NumericalVariable that contains the simple moving average of the variable over the same number of observations. The GetMovingSum method calculates a moving sum of the the n observations up to and including the current observation.

The GetMovingAverageAbsoluteDeviation method calculates the average absolute deviation of the n observations up to and including the current observation from the corresponding current observation of another variable. The first parameter is the number of observations. The second parameter is a NumericalVariable that contains the means from which the deviation is to be calculated.

Period-to-date values and differences.

There are two transformations in this group. The first calculates cumulative sums of the original observations within a series of intervals. The second is the inverse transformation of the first. It calculates the difference between each observation and the previous one, except for the first observation in each interval.

The GetPeriodToDateValues method calculates period-to-date sums. Each observation is the cumulative sum of the observations in the current interval.

A common use for this method is to create period-to-date sum of a time series variable relative to a longer time frame. For example, if the variable contains monthly earnings, you can use these methods to calculate the earnings to date per quarter.

This method has two overloads. The first takes an integer array whose elements specify the boundaries of the intervals. The remaining two parameters are BoundaryIntervalBehavior values that indicate how the first and last interval should be handled. If startBehavior has a value of Exclude, then new observations with index smaller than the first index in indexes should be set to NaN.

The second overload is useful for variables that are part of a TimeSeriesCollection. The first parameter is a DateTimeVariable that specifies the time corresponding to each observation. It must have the same length as the numerical variable. The second parameter is another DateTimeVariable that indicates the start time of each interval. The remaining two parameters are BoundaryIntervalBehavior values, as before.

The GetPeriodToDateDifferences method performs the reverse operation. Each observation is the difference between the current and the previous observation in the current interval, except when it is the first observation in the current interval. In that case, the new observation is the same as the original observation.

Miscellaneous transformations.

The GetReferenceIndex method scales the observations to make them comparable to a standard index value. The method has two overloads. The first overload takes two parameter. The first is the index of the observation that serves as a reference. The second parameter is the base value of the index. The observations are scaled so that the index value of the reference observation equals to base value of the index.

The second overload takes three parameters. This method calculates the reference index based on the sum of a range of observations. It takes three parameters. The first is the index of the first observation in the reference interval. The second parameter is the index of the last observation in the reference interval. The third parameter is the base value of the index. The observations are scaled so that the sum of the index values in the reference interval equals the base value of the index.

The GetPositiveToNegativeRatio method calculates the ratio of the positive values to the negative values over an interval. The first parameter is the lenght of the interval. The second parameter is a NumericalVariable that serves as the reference variable. The method calculates the ratio of the sum of observations within the specified period where the corresponding observation of the reference variable is positive, and the sum of observations where the corresponding reference observation is negative. Observations where the corresponding reference observation is zero are not included.

The GetPositiveToNegativeIndex method performs a similar calculation. However, the result is not returned as a ratio, but as an index value between 0 and 100. It has the same parameters as the GetPositiveToNegativeRatio method.

Finally, the GetBoxCoxTransform returns the Box-Cox transform of the variable for the specified parameter lambda, which must be between 0 and 1. This transformation is often used to reduce the effects of non-normality.

Up: Continuous Variables Next: Time Variables Previous: Numerical Variables Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army