When computing models that include many variables, collinearity is
often a problem. One of the symptoms is that the regression coefficients
may be very large, and the associated standard errors are very large as well.
This means that the coefficients are not well defined.
Various ways have been devised to address this problem.
One of the most successful is regularization.
Regularization limits the problems associated with collinearity by
minimizing the sum of the squares of the residuals combined with
a penalty term that measures the magnitude of the coefficients.
Large coefficients are penalized, but the overall results can
be more reliable.
Different forms of the penalty term lead to different methods.
If the penalty term is quadratic, then the method is called
ridge regression. If the penalty term is a sum
of the absolute values of the coefficients, it is called
LASSO (Least Absolute Shrinkage and Selection Operator).
If it is a combination of the two, it is called elastic net.
Each method has its advantages and disadvantages.
In general, standard errors and confidence intervals for the regression coefficients
are not available for regularized regression. The reason is that the regularization
introduces a bias towards smaller values, which makes the true variance of the coefficient
hard to determine.
Computing regularized regression
Regularization is implemented in two ways. Ridge regression is implemented as an option to the
class. LASSO and elastic net are implemented using a separate class,
Ridge regression can be computed like ordinary linear regression by setting the
property to a strictly positive value.
The value of the parameter is used as the coefficient
of the quadratic term that is added to the sum of the squared residuals.
By default, the predictors are standardized to have zero mean and unit
standard deviation. The ridge parameter's size should therefore be compared
to unity, not to the scale of the predictors.
LASSO and elastic net are implemented using the
should be set to the coefficient of the penalty term.
A second parameter,
determines the relative importance of the linear and quadratic penalty terms.
For LASSO, it should be set to or kept at its default value of one.
The value of the regularization parameter is used as the coefficient
of the sum of absolute values of the coefficients that is added to
the sum of the squared residuals.
Elastic net is a generalization of both ridge regression and the LASSO
which includes both a linear and a quadratic term in the penalty.
Once again, the
property should be set a strictly positive value.
should be set to a value between 0 and 1. It specifies the fraction of the penalty term
that is linear. A value of 0.4 means that the linear term (sum of the absolute
values of the coefficients) has a coefficient of 0.4 times
while the quadratic term will have a coefficient of 0.6 times
For LASSO and elastic net, it is possible to obtain the regularization path.
The regularization path shows how the values of the regression coefficients change
as the regularization parameter changes. Above a certain value, all regression coefficients
will be zero. The regularization path is only interesting up to this value.
The regularization path can be obtained in two steps: once the regularization ratio is set
to its desired value, a call to
returns a vector of suitable regularization parameters. This method takes two arguments:
the number of points and the ratio between the smallest and the largest value.
The largest value is chosen automatically to approximate the smallest value that produces
all zero coefficients.
A call to
will then compute the regularization path. It takes the vector returned from
as its first argument and returns a matrix whose rows
contain the regression coefficients computed for the corresponding value of the