Extreme Optimization >
User's Guide >
Statistics Library >
Regression Analysis >
Polynomial Regression
Extreme Optimization User's Guide
User's Guide
Up: Regression Analysis Next: Nonlinear Regression Previous: Multiple Linear Regression Contents
Polynomial Regression
One special kind of multiple linear regression is
polynomial regression. In polynomial regression, the independent variables
are all powers of one or more from a smaller set of independent
variables. The Extreme Optimization Numerical Libraries for .NET
supports polynomial regression on one variable through the
PolynomialRegressionModel class.
Constructing Polynomial Regression Models
The PolynomialRegressionModel class has five constructors. These mirror the constructors of the
SimpleRegressionModel class, but
add a parameter that specifies the degree of the regression polynomial.
The first constructor takes two arrays of double values. The first array
contains the data for the dependent variable. The second array contains the data
for the independent variable. The dependent variable is named 'Y,' while the
independent variable is named 'X.' The last parameter is the degree of the regression polynomial.
For our example, we use the 'Pontius' dataset from the National Institute for Standards and Technology's
Statistical Reference Datasets. This dataset contains measurement data
from the calibration of load cells. The independent variable is the load.
The dependent variable is the deflection. We will fit a second degree polynomial through the data.
| C# | Copy Code |
double[] deflectionData = {
.11019, .21956, .32949, .43899, .54803, .65694, .76562,
.87487, .98292, 1.09146, 1.20001, 1.30822, 1.41599, 1.52399,
1.63194, 1.73947, 1.84646, 1.95392, 2.06128, 2.16844, .11052,
.22018, .32939, .43886, .54798, .65739, .76596, .87474, .98300,
1.09150, 1.20004, 1.30818, 1.41613, 1.52408, 1.63159, 1.73965,
1.84696, 1.95445, 2.06177, 2.16829};
double[] loadData = {
150000, 300000, 450000, 600000, 750000, 900000,
1050000, 1200000, 1350000, 1500000, 1650000, 1800000,
1950000, 2100000, 2250000, 2400000, 2550000, 2700000,
2850000, 3000000, 150000, 300000, 450000, 600000,
750000, 900000, 1050000, 1200000, 1350000, 1500000,
1650000, 1800000, 1950000, 2100000, 2250000, 2400000,
2550000, 2700000, 2850000, 3000000};
PolynomialRegressionModel model1 = new PolynomialRegressionModel(deflectionData, loadData, 2); |
| Visual Basic | Copy Code |
Dim deflectionData As Double() = { _
.11019, .21956, .32949, .43899, .54803, .65694, .76562,
.87487, .98292, 1.09146, 1.20001, 1.30822, 1.41599, 1.52399,
1.63194, 1.73947, 1.84646, 1.95392, 2.06128, 2.16844, .11052,
.22018, .32939, .43886, .54798, .65739, .76596, .87474, .98300,
1.09150, 1.20004, 1.30818, 1.41613, 1.52408, 1.63159, 1.73965,
1.84696, 1.95445, 2.06177, 2.16829}
Dim loadData As Double() = { _
150000, 300000, 450000, 600000, 750000, 900000, _
1050000, 1200000, 1350000, 1500000, 1650000, 1800000, _
1950000, 2100000, 2250000, 2400000, 2550000, 2700000, _
2850000, 3000000, 150000, 300000, 450000, 600000, _
750000, 900000, 1050000, 1200000, 1350000, 1500000, _
1650000, 1800000, 1950000, 2100000, 2250000, 2400000, _
2550000, 2700000, 2850000, 3000000}
Dim model1 As PolynomialRegressionModel = _
New PolynomialRegressionModel(deflectionData, loadData, 2) |
The second constructor takes two Vector objects and an integer. Once again, the
first vector contains the data for the dependent variable. The
second vector contains the data for the independent variable. The dependent
variable is named 'Y,' while the independent variable is named 'X.' The last parameter is the degree of the regression polynomial.
| C# | Copy Code |
Vector loadVector = new GeneralVector(loadData);
PolynomialRegressionModel model2 = new PolynomialRegressionModel(deflectionVector, loadVector, 2); |
| Visual Basic | Copy Code |
Dim deflectionVector As Vector = New GeneralVector(deflectionData)
Dim independentVector As Vector = New GeneralVector(loadData)
Dim model2 As PolynomialRegressionModel = _
New PolynomialRegressionModel(deflectionVector, loadVector, 2) |
The third constructor takes two NumericalVariable objects as its first two
parameters, and an integer as the third. The first variable represents the dependent variable. The second
variable represents the independent variable. The third parameter is the degree of the regression polynomial.
| C# | Copy Code |
NumericalVariable load = new NumericalVariable("load", loadData);
PolynomialRegressionModel model3 = new PolynomialRegressionModel(deflection, load, 2); |
| Visual Basic | Copy Code |
Dim deflection As NumericalVariable = New NumericalVariable("deflection", deflectionData)
Dim load As NumericalVariable = New NumericalVariable("load", loadData)
Dim model3 As PolynomialRegressionModel = _
New PolynomialRegressionModel(deflection, load, 2) |
The fourth constructor takes 4 parameters. The first parameter is a
VariableCollection object that contains the variables to be used in the
regression. The second parameter is a string containing the name of the
dependent variable. The third parameter is a string containing the name of the
independent variable. The two names must exist in the collection specified
by the first parameter, and the variables must be of type NumericalVariable.
The last parameter is the degree of the regression polynomial.
| C# | Copy Code |
variables.Add(dependent);
variables.Add(independent);
PolynomialRegressionModel model4 = new PolynomialRegressionModel(variables, "deflection", "load", 2); |
| Visual Basic | Copy Code |
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent)
Dim model4 As PolynomialRegressionModel = New PolynomialRegressionModel(variables, "deflection", "load", 2) |
The fifth constructor also takes 4 parameters. The first parameter is
a DataTable object that contains the data for the regression analysis. The
second parameter is a string containing the name of the column that contains the
data for the dependent variable. The third parameter is a string containing the
name of the column that contains the data for the independent variable. Both columns
must be numerical or convertible to numerical values.
The last parameter is once again the degree of the regression polynomial.
| C# | Copy Code |
// Fill data table with data from some datasource.
PolynomialRegressionModel model5 = new PolynomialRegressionModel(table, "deflection", "load", 2); |
| Visual Basic | Copy Code |
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model5 _
As PolynomialRegressionModel = New PolynomialRegressionModel(table, "deflection", "load", 2) |
Computing the Regression Polynomial
The Compute
method performs the actual analysis. Most properties and methods throw an exception when they are accessed before
the Compute method is called. You can verify that the model has been calculated by inspecting the
Computed property.
The GetRegressionPolynomial
method returns a Polynomial
object that represents the regression polynomial.
| C# | Copy Code |
model1.Compute();
Extreme.Mathematics.Curves.Polynomial regressionPolynomial = model1.GetRegressionPolynomial(); |
| Visual Basic | Copy Code |
model1.Compute()
Dim regressionPolynomial As Extreme.Mathematics.Curves.Polynomial = _
model1.GetRegressionPolynomial() |
The PredictedValues property returns a Vector
that contains the values of the dependent variable as predicted by the model.
The Residuals property returns a vector containing the difference between the
actual and the predicted values of the dependent variable. Both vectors contain
one element for each observation.
Regression Parameters
The
PolynomialRegressionModel class' Parameters
property returns a ParameterCollection
object that contains the parameters of the regression model. The members of
this collection are of type Parameter.
Regression parameters are created by the model. You cannot create them
directly.
A polynomial regression model has a parameter for each power from zero or one up to the degree of the
regression polynomial. If there is an intercept (constant term) in the model, then the numerical index of the
parameter corresponds to the power associated with the coefficient.
The Parameter class has four useful properties. The Value property
returns the numerical value of the parameter, while the StandardError
property returns the standard deviation of the parameter's distribution.
The Statistic
property returns the value of the t-statistic corresponding to the hypothesis
that the parameter equals zero. The PValue
property returns the corresponding p-value. A high p-value indicates that
the variable associated with the parameter does not make a
significant contribution to explaining the data. The p-value always corresponds
to a two-tailed test.
Of particular interest are the p-values associated with the higher powers
in the polynomial. The p-value for the parameter for the highest power should
be small. Otherwise, this is an indication that the model is over-fitting the data. The following
example prints the properties of the parameter associated with the 2nd degree (square)
term) in our earlier example:
| C# | Copy Code |
Parameter squareTerm = model1.Parameters[2];
// Or: Parameter squareTerm = model1.Parameters["load^2"];
Console.WriteLine("Name: {0}", squareTerm.Name);
Console.WriteLine("Value: {0}", squareTerm.Value);
Console.WriteLine("St.Err.: {0}", squareTerm.StandardError);
Console.WriteLine("t-statistic: {0}", squareTerm.TStatistic);
Console.WriteLine("p-value: {0}", squareTerm.PValue); |
| Visual Basic | Copy Code |
Dim squareTerm As = model1.Parameters(2)
' Or: Dim squareTerm As Parameter = model1.Parameters(load^2")
Console.WriteLine("Name: {0}", squareTerm.Name)
Console.WriteLine("Value: {0}", squareTerm.Value)
Console.WriteLine("St.Err.: {0}", squareTerm.StandardError)
Console.WriteLine("t-statistic: {0}", squareTerm.TStatistic)
Console.WriteLine("p-value: {0}", squareTerm.PValue) |
The Parameter class has one method: GetConfidenceInterval.
This method takes one parameter: a confidence level between 0 and 1. A value of
0.95 corresponds to a confidence level of 95%. The method returns the confidence
interval for the parameter at the specified confidence level as an Interval
structure.
Verifying the Quality of the Regression
The ResidualSumOfSquares
property gives the sum of the squares of the residuals. The regression line was
found by minimizing this value.
The StandardError
property gives the standard deviation of the data.
The
RSquared
property returns the coefficient of determination. It is the ratio of the
variation in the data that is explained by the model compared to the total
variation in the data. Its value is always between 0 and 1, where 0 means the
model explains nothing and 1 means the model explains the data perfectly.
When the degree of the polynomial is high, the additional terms
may be modeling the errors in the data rather than the data itself. This
causes the full model to be less reliable for making predictions. The AdjustedRSquared
property returns an adjusted R2 value that attempts to compensate for
this phenomenon.
An entirely different assessment is available through an analysis of
variance. Here, the variation in the data is decomposed into a component
explained by the model, and the variation in the residuals. The FStatistic
property returns the F-statistic for the ratio of these two variances. The
PValue
property returns the corresponding p-value. A low p-value means that it is
unlikely that the variation in the model is the same as the variation in the
residuals. This means that the model is significant.
The results of the analysis of variance are also summarized in the regression
model's ANOVA table, returned by the AnovaTable
property.
Up: Regression Analysis Next: Nonlinear Regression Previous: Multiple Linear Regression Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation