Polynomial Regression | Extreme Optimization Numerical Libraries for .NET Professional |

One special kind of multiple linear regression is polynomial regression. In polynomial regression, the independent variables are all powers of one or more from a smaller set of independent variables. The PolynomialRegressionModel class implements polynomial regression on one variable.

The PolynomialRegressionModel class has two constructors. These mirror the constructors of the SimpleRegressionModel class, but add a argument that specifies the degree of the regression polynomial.

The first constructor takes three arguments. The first is a vector that contains the data for the dependent variable. The second is a vector that contains the data for the independent variable. The dependent variable is named 'Y,' while the independent variable is named 'X.' The last argument is the degree of the regression polynomial.

For our example, we use the Pontius dataset from the National Institute for Standards and Technology's Statistical Reference Datasets. This dataset contains measurement data from the calibration of load cells. The independent variable is the load. The dependent variable is the deflection. We will fit a second degree polynomial through the data.

var deflection = Vector.Create(new double[] { .11019, .21956, .32949, .43899, .54803, .65694, .76562, .87487, .98292, 1.09146, 1.20001, 1.30822, 1.41599, 1.52399, 1.63194, 1.73947, 1.84646, 1.95392, 2.06128, 2.16844, .11052, .22018, .32939, .43886, .54798, .65739, .76596, .87474, .98300, 1.09150, 1.20004, 1.30818, 1.41613, 1.52408, 1.63159, 1.73965, 1.84696, 1.95445, 2.06177, 2.16829 }); var load = Vector.Create(new double[] { 150000, 300000, 450000, 600000, 750000, 900000, 1050000, 1200000, 1350000, 1500000, 1650000, 1800000, 1950000, 2100000, 2250000, 2400000, 2550000, 2700000, 2850000, 3000000, 150000, 300000, 450000, 600000, 750000, 900000, 1050000, 1200000, 1350000, 1500000, 1650000, 1800000, 1950000, 2100000, 2250000, 2400000, 2550000, 2700000, 2850000, 3000000}); var model1 = new PolynomialRegressionModel(deflection, load, 2);

The second constructor takes 4 arguments. The first argument is a
IDataFrame
(a DataFrame

var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>() { { "deflection", deflection }, { "load", load } }); var model2 = new PolynomialRegressionModel(dataFrame, "deflection", "load", 2);

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

The GetRegressionPolynomial method returns a Polynomial that represents the regression polynomial.

The PredictedValues property returns a vector that contains the values of the dependent variable as predicted by the model. The Residuals property returns a vector containing the difference between the actual and the predicted values of the dependent variable. Both vectors contain one element for each observation.

The PolynomialRegressionModel class' Parameters property returns a ParameterVector object that contains the parameters of the regression model. The elements of this vector are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

A polynomial regression model has a parameter for each power from zero or one up to the degree of the regression polynomial. If there is an intercept (constant term) in the model, then the numerical index of the parameter corresponds to the power associated with the coefficient.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test.

Of particular interest are the p-values associated with the higher powers in the polynomial. The p-value for the parameter for the highest power should be small. Otherwise, this is an indication that the model is over-fitting the data. The following example prints the properties of the parameter associated with the 2nd degree (square) term) in our earlier example:

var squareTerm = model1.Parameters[2]; // Or: Parameter squareTerm = model1.Parameters["load^2"]; Console.WriteLine("Name: {0}", squareTerm.Name); Console.WriteLine("Value: {0}", squareTerm.Value); Console.WriteLine("St.Err.: {0}", squareTerm.StandardError); Console.WriteLine("t-statistic: {0}", squareTerm.Statistic); Console.WriteLine("p-value: {0}", squareTerm.PValue);

The Parameter class
has one method: GetConfidenceInterval.
This method takes one argument: a confidence level between 0 and 1.
A value of 0.95 corresponds to a confidence level of 95%. The method returns the
confidence interval for the parameter at the specified confidence level
as an Interval

The ResidualSumOfSquares property gives the sum of the squares of the residuals. The regression line was found by minimizing this value. The StandardError property gives the standard deviation of the data.

The RSquared property returns the coefficient of determination. It is the ratio of the variation in the data that is explained by the model compared to the total variation in the data. Its value is always between 0 and 1, where 0 means the model explains nothing and 1 means the model explains the data perfectly.

When the degree of the polynomial is high, the additional terms may be modeling
the errors in the data rather than the data itself. This causes the full model
to be less reliable for making predictions. The
AdjustedRSquared
property returns an adjusted R^{2} value
that attempts to compensate for this phenomenon.

An entirely different assessment is available through an analysis of variance. Here, the variation in the data is decomposed into a component explained by the model, and the variation in the residuals. The FStatistic property returns the F-statistic for the ratio of these two variances. The PValue property returns the corresponding p-value. A low p-value means that it is unlikely that the variation in the model is the same as the variation in the residuals. This means that the model is significant.

The results of the analysis of variance are also summarized in the regression model's ANOVA table, returned by the AnovaTable property.

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.