Multiple Linear Regression | Extreme Optimization Numerical Libraries for .NET Professional |

Multiple linear regression is a technique to analyze a linear relationship between one or more independent variables and a dependent variable. The values of the independent variables are considered to be exact, while the values of the dependent variables are subject to error. Multiple linear regression is implemented by the LinearRegressionModel class.

The LinearRegressionModel
class has three constructors. The first constructor takes two arguments.
The first is a Vector

var dependent = Vector.Create(yData); var independent1 = Vector.Create(x1Data); var independent2 = Vector.Create(x2Data); var model1 = new LinearRegressionModel(dependent, independent1, independent2);

The second constructor takes 3 arguments. The first argument is a
IDataFrame (a
DataFrame

var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>() { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } }); var model2 = new LinearRegressionModel(dataFrame, "y", "x1", "x2");

The next overload takes two or three arguments. The first argument once again contains the data. The second is a string that contains a formula that describes the model. See the section on formulas for details. The same model as above can be defined using a formula as:

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

The PredictedValues
property returns a Vector

The LinearRegressionModel class' Parameters property returns a ParameterVector object that contains the parameters of the regression model. The elements of this vector are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A multiple linear regression model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test.

The following example prints the properties of the parameter associated with the x1 variable in our earlier example:

var x1Parameter = model1.Parameters.Get("x1"); Console.WriteLine("Name: {0}", x1Parameter.Name); Console.WriteLine("Value: {0}", x1Parameter.Value); Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError); Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic); Console.WriteLine("p-value: {0}", x1Parameter.PValue);

The ResidualSumOfSquares property gives the sum of the squares of the residuals. The regression line was found by minimizing this value. The StandardError property gives the standard deviation of the data.

The RSquared property returns the coefficient of determination. It is the ratio of the variation in the data that is explained by the model compared to the total variation in the data. Its value is always between 0 and 1, where 0 means the model explains nothing and 1 means the model explains the data perfectly.

When the model contains many independent variables, the additional variables
may be modeling the errors in the data rather than the data itself.
This causes the full model to be less reliable for making predictions.
The AdjustedRSquared property returns an adjusted R^{2} value
that attempts to compensate for this phenomenon.

An entirely different assessment is available through an analysis of variance. Here, the variation in the data is decomposed into a component explained by the model, and the variation in the residuals. The FStatistic property returns the F-statistic for the ratio of these two variances. The PValue property returns the corresponding p-value. A low p-value means that it is unlikely that the variation in the model is the same as the variation in the residuals. This means that the model is significant.

The results of the analysis of variance are also summarized in the regression model's ANOVA table, returned by the AnovaTable property.

The LinearRegressionModel class has the ability to automatically select the 'best' set of variables through a process called stepwise regression. To run a stepwise regression, create a StepwiseOptions object and assign it to the model's StepwiseOptions property. There are five methods for stepwise regression, as enumerated by the StepwiseRegressionMethod type:

Method | Description |
---|---|

AllVariables | All variables are included in the model. |

ForwardStepwise | Stepwise regression starting from an empty model, allowing variables to be added and removed. |

ForwardSelection | Stepwise regression starting from an empty model, allowing variables to be added only. |

BackwardStepwise | Stepwise regression starting from a complete model, allowing variables to be added and removed. |

BackwardElimination | Stepwise regression starting from a complete model, allowing variables to be removed only. |

To create a stepwise regression, create a new StepwiseOptions object and assign one of the above methods to its Method property. The thresholds for allowing a variable to enter or leave the model can be specified either on the basis of the F-statistic, or on the basis of the corresponding p-value. The threshold values can be set by setting either ToEnterStatisticThreshold and ToRemoveStatisticThreshold, or ToEnterPValueThreshold and ToRemovePValueThreshold.

With the options set, the model can be computed in the same way as a standard model, by calling the Compute method. The parameters in the model's Parameters collection are listed in the order in which they were added to the model.

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.