Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference

Skip Navigation LinksHome»Documentation»Statistics Library User's Guide»Regression Analysis»Multiple Linear Regression

Multiple Linear Regression

Extreme Optimization Numerical Libraries for .NET Professional

Multiple linear regression is a technique to analyze a linear relationship between one or more independent variables and a dependent variable. The values of the independent variables are considered to be exact, while the values of the dependent variables are subject to error. Multiple linear regression is implemented by the LinearRegressionModel class.

Constructing Multiple Linear Regression Models

The LinearRegressionModel class has three constructors. The first constructor takes two arguments. The first is a VectorT that represents the dependent variable. The second is a parameter array of vectors that represent the independent variables.

C#
VB
C++
F#
Copy
var dependent = Vector.Create(yData);
var independent1 = Vector.Create(x1Data);
var independent2 = Vector.Create(x2Data);
var model1 = new LinearRegressionModel(dependent, independent1, independent2);
Dim dependent = Vector.Create(yData)
Dim independent1 = Vector.Create(x1Data)
Dim independent2 = Vector.Create(x2Data)
Dim model1 = New LinearRegressionModel(dependent, independent1, independent2)

No code example is currently available or this language may not be supported.

let dependent = Vector.Create(yData)
let independent1 = Vector.Create(x1Data)
let independent2 = Vector.Create(x2Data)
let model1 = LinearRegressionModel(dependent, independent1, independent2)

The second constructor takes 3 arguments. The first argument is a IDataFrame (a DataFrameR, C or MatrixT) that contains the variables to be used in the regression. The second argument is a string containing the name of the dependent variable. The third argument is an array of strings containing the names of the independent variables. All the names must exist in the column index of the data frame specified by the first argument.

C#
VB
C++
F#
Copy
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
    { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } });
var model2 = new LinearRegressionModel(dataFrame, "y", "x1", "x2");
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
    {{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New LinearRegressionModel(frame, "y", "x1", "x2")

No code example is currently available or this language may not be supported.

let columns = Dictionary<string,obj>()
[ "y", dependent ; "x1", independent1 ; "x2", independent2 ]  |> Seq.iter columns.Add
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = LinearRegressionModel(dataFrame, "y", "x1", "x2")

The next overload takes two or three arguments. The first argument once again contains the data. The second is a string that contains a formula that describes the model. See the section on formulas for details. The same model as above can be defined using a formula as:

C#
VB
C++
F#
Copy
var model3 = new LinearRegressionModel(dataFrame, "y ~ x1 + x2");
Dim model3 = New LinearRegressionModel(frame, "y ~ x1 + x2")

No code example is currently available or this language may not be supported.

let model3 = LinearRegressionModel(dataFrame, "y ~ x1 + x2")
Computing the Regression

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

C#
VB
C++
F#
Copy
model1.Fit();
model1.Fit()

No code example is currently available or this language may not be supported.

model1.Fit()

The Predictions property returns a VectorT that contains the values of the dependent variable as predicted by the model. The Residuals property returns a vector containing the difference between the actual and the predicted values of the dependent variable. Both vectors contain one element for each observation.

Regression Parameters

The LinearRegressionModel class' Parameters property returns a ParameterVectorT object that contains the parameters of the regression model. The elements of this vector are of type ParameterT. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A multiple linear regression model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0.

The ParameterT class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test.

The following example prints the properties of the parameter associated with the x1 variable in our earlier example:

C#
VB
C++
F#
Copy
var x1Parameter = model1.Parameters.Get("x1");
Console.WriteLine("Name:        {0}", x1Parameter.Name);
Console.WriteLine("Value:       {0}", x1Parameter.Value);
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError);
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value:     {0}", x1Parameter.PValue);
Dim x1Parameter = model1.Parameters.Get("x1")
Console.WriteLine("Name:        {0}", x1Parameter.Name)
Console.WriteLine("Value:       {0}", x1Parameter.Value)
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value:     {0}", x1Parameter.PValue)

No code example is currently available or this language may not be supported.

let x1Parameter = model1.Parameters.Get("x1")
Console.WriteLine("Name:        0}", x1Parameter.Name)
Console.WriteLine("Value:       0}", x1Parameter.Value)
Console.WriteLine("St.Err.:     0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: 0}", x1Parameter.Statistic)
Console.WriteLine("p-value:     0}", x1Parameter.PValue)
Verifying the Quality of the Regression

The ResidualSumOfSquares property gives the sum of the squares of the residuals. The regression line was found by minimizing this value. The StandardError property gives the standard deviation of the data.

The RSquared property returns the coefficient of determination. It is the ratio of the variation in the data that is explained by the model compared to the total variation in the data. Its value is always between 0 and 1, where 0 means the model explains nothing and 1 means the model explains the data perfectly.

When the model contains many independent variables, the additional variables may be modeling the errors in the data rather than the data itself. This causes the full model to be less reliable for making predictions. The AdjustedRSquared property returns an adjusted R2 value that attempts to compensate for this phenomenon.

An entirely different assessment is available through an analysis of variance. Here, the variation in the data is decomposed into a component explained by the model, and the variation in the residuals. The FStatistic property returns the F-statistic for the ratio of these two variances. The PValue property returns the corresponding p-value. A low p-value means that it is unlikely that the variation in the model is the same as the variation in the residuals. This means that the model is significant.

The results of the analysis of variance are also summarized in the regression model's ANOVA table, returned by the AnovaTable property.

Stepwise Regression

The LinearRegressionModel class has the ability to automatically select the 'best' set of variables through a process called stepwise regression. To run a stepwise regression, create a StepwiseOptions object and assign it to the model's StepwiseOptions property. There are five methods for stepwise regression, as enumerated by the StepwiseRegressionMethod type:

Method

Description

AllVariables

All variables are included in the model.

ForwardStepwise

Stepwise regression starting from an empty model, allowing variables to be added and removed.

ForwardSelection

Stepwise regression starting from an empty model, allowing variables to be added only.

BackwardStepwise

Stepwise regression starting from a complete model, allowing variables to be added and removed.

BackwardElimination

Stepwise regression starting from a complete model, allowing variables to be removed only.

To create a stepwise regression, create a new StepwiseOptions object and assign one of the above methods to its Method property. The thresholds for allowing a variable to enter or leave the model can be specified either on the basis of the F-statistic, or on the basis of the corresponding p-value. The threshold values can be set by setting either ToEnterStatisticThreshold and ToRemoveStatisticThreshold, or ToEnterPValueThreshold and ToRemovePValueThreshold.

With the options set, the model can be computed in the same way as a standard model, by calling the Compute method. The parameters in the model's Parameters collection are listed in the order in which they were added to the model.

Copyright (c) 2004-2023 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2023, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.