Multiple linear regression is a technique to analyze a linear relationship
between one or more independent variables and a dependent variable.
The values of the independent variables are considered to be exact,
while the values of the dependent variables are subject to error.
Multiple linear regression is implemented by the
LinearRegressionModel class.
Constructing Multiple Linear Regression Models
The LinearRegressionModel
class has three constructors. The first constructor takes two arguments.
The first is a VectorT
that represents the dependent variable. The second is a parameter array
of vectors that represent the independent variables.
var dependent = Vector.Create(yData);
var independent1 = Vector.Create(x1Data);
var independent2 = Vector.Create(x2Data);
var model1 = new LinearRegressionModel(dependent, independent1, independent2);
Dim dependent = Vector.Create(yData)
Dim independent1 = Vector.Create(x1Data)
Dim independent2 = Vector.Create(x2Data)
Dim model1 = New LinearRegressionModel(dependent, independent1, independent2)
No code example is currently available or this language may not be supported.
let dependent = Vector.Create(yData)
let independent1 = Vector.Create(x1Data)
let independent2 = Vector.Create(x2Data)
let model1 = LinearRegressionModel(dependent, independent1, independent2)
The second constructor takes 3 arguments. The first argument is a
IDataFrame (a
DataFrameR, C or
MatrixT) that
contains the variables to be used in the regression. The second argument
is a string containing the name of the dependent variable. The third argument
is an array of strings containing the names of the independent variables.
All the names must exist in the column index of the data frame specified
by the first argument.
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
{ { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } });
var model2 = new LinearRegressionModel(dataFrame, "y", "x1", "x2");
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
{{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New LinearRegressionModel(frame, "y", "x1", "x2")
No code example is currently available or this language may not be supported.
let columns = Dictionary<string,obj>()
[ "y", dependent ; "x1", independent1 ; "x2", independent2 ] |> Seq.iter columns.Add
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = LinearRegressionModel(dataFrame, "y", "x1", "x2")
The next overload takes two or three arguments. The first argument once again
contains the data. The second is a string that contains a formula that
describes the model. See the section on formulas for details.
The same model as above can be defined using a formula as:
var model3 = new LinearRegressionModel(dataFrame, "y ~ x1 + x2");
Dim model3 = New LinearRegressionModel(frame, "y ~ x1 + x2")
No code example is currently available or this language may not be supported.
let model3 = LinearRegressionModel(dataFrame, "y ~ x1 + x2")
The Compute
method performs the actual analysis. Most properties and methods throw an exception
when they are accessed before the
Compute
method is called. You can verify that the model has been calculated by inspecting the
Computed property.
No code example is currently available or this language may not be supported.
The Predictions
property returns a VectorT
that contains the values of the dependent variable as predicted by the model.
The Residuals
property returns a vector containing the difference between the actual and
the predicted values of the dependent variable. Both vectors contain one element
for each observation.
The LinearRegressionModel
class' Parameters
property returns a ParameterVectorT
object that contains the parameters of the regression model. The elements of this vector
are of type ParameterT.
Regression parameters are created by the model.
You cannot create them directly.
Parameters can be accessed by numerical index or by name. The name of a parameter
is usually the name of the variable associated with it.
A multiple linear regression model has as many parameters
as there are independent variables, plus one for the
intercept (constant term) when it is included. The intercept, if present,
is the first parameter in the collection, with index 0.
The ParameterT class
has four useful properties. The Value
property returns the numerical value of the parameter, while the
StandardError
property returns the standard deviation of the parameter's distribution.
The Statistic
property returns the value of the t-statistic corresponding to the hypothesis
that the parameter equals zero. The
PValue property
returns the corresponding p-value. A high p-value indicates that the variable
associated with the parameter does not make a significant contribution
to explaining the data. The p-value always corresponds to a two-tailed test.
The following example prints the properties of the parameter associated with
the x1 variable in our earlier example:
var x1Parameter = model1.Parameters.Get("x1");
Console.WriteLine("Name: {0}", x1Parameter.Name);
Console.WriteLine("Value: {0}", x1Parameter.Value);
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError);
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value: {0}", x1Parameter.PValue);
Dim x1Parameter = model1.Parameters.Get("x1")
Console.WriteLine("Name: {0}", x1Parameter.Name)
Console.WriteLine("Value: {0}", x1Parameter.Value)
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value: {0}", x1Parameter.PValue)
No code example is currently available or this language may not be supported.
let x1Parameter = model1.Parameters.Get("x1")
Console.WriteLine("Name: 0}", x1Parameter.Name)
Console.WriteLine("Value: 0}", x1Parameter.Value)
Console.WriteLine("St.Err.: 0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: 0}", x1Parameter.Statistic)
Console.WriteLine("p-value: 0}", x1Parameter.PValue)
Verifying the Quality of the Regression
The ResidualSumOfSquares
property gives the sum of the squares of the residuals. The regression line was found
by minimizing this value. The
StandardError
property gives the standard deviation of the data.
The RSquared
property returns the coefficient of determination. It is the ratio of the variation
in the data that is explained by the model compared to the total variation in the data.
Its value is always between 0 and 1, where 0 means the model explains nothing and
1 means the model explains the data perfectly.
When the model contains many independent variables, the additional variables
may be modeling the errors in the data rather than the data itself.
This causes the full model to be less reliable for making predictions.
The AdjustedRSquared property returns an adjusted R2 value
that attempts to compensate for this phenomenon.
An entirely different assessment is available through an analysis of variance.
Here, the variation in the data is decomposed into a component explained by the model,
and the variation in the residuals. The
FStatistic
property returns the F-statistic for the ratio of these two variances.
The PValue
property returns the corresponding p-value. A low p-value means that it is unlikely
that the variation in the model is the same as the variation in the residuals.
This means that the model is significant.
The results of the analysis of variance are also summarized in the regression
model's ANOVA table, returned by the
AnovaTable
property.
The LinearRegressionModel
class has the ability to automatically select the 'best' set of variables through
a process called stepwise regression.
To run a stepwise regression, create a
StepwiseOptions
object and assign it to the model's
StepwiseOptions
property. There are five methods for stepwise regression, as enumerated by the
StepwiseRegressionMethod type:
Method | Description |
---|
AllVariables | All variables are included in the model. |
ForwardStepwise | Stepwise regression starting from an empty model, allowing variables to be added and removed. |
ForwardSelection | Stepwise regression starting from an empty model, allowing variables to be added only. |
BackwardStepwise | Stepwise regression starting from a complete model, allowing variables to be added and removed. |
BackwardElimination | Stepwise regression starting from a complete model, allowing variables to be removed only. |
To create a stepwise regression, create a new
StepwiseOptions object and
assign one of the above methods to its
Method property.
The thresholds for allowing a variable to enter or leave the model can be specified either
on the basis of the F-statistic, or on the basis of the corresponding p-value.
The threshold values can be set by setting either
ToEnterStatisticThreshold
and ToRemoveStatisticThreshold,
or ToEnterPValueThreshold
and ToRemovePValueThreshold.
With the options set, the model can be computed in the same way as a standard model,
by calling the Compute method.
The parameters in the model's Parameters
collection are listed in the order in which they were added to the model.