Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Nuget packages
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Statistics Library User's Guide
    • Statistical Variables
    • Numerical Variables
    • Statistical Models
    • Regression Analysis
    • Analysis of Variance
    • Time Series Analysis
    • Multivariate Analysis
    • Continuous Distributions
    • Discrete Distributions
    • Multivariate Distributions
    • Kernel Density Estimation
    • Hypothesis Tests
    • Appendices
  • Regression Analysis
    • Simple Linear Regression
    • Multiple Linear Regression
    • Ridge regression, LASSO
    • Polynomial Regression
    • Nonlinear Regression
    • Logistic Regression
    • Generalized Linear Models
  • Ridge regression, LASSO

Ridge regression, LASSO and elastic net

Extreme Optimization Numerical Libraries for .NET Professional

When computing models that include many variables, collinearity is often a problem. One of the symptoms is that the regression coefficients may be very large, and the associated standard errors are very large as well. This means that the coefficients are not well defined.

Various ways have been devised to address this problem. One of the most successful is regularization. Regularization limits the problems associated with collinearity by minimizing the sum of the squares of the residuals combined with a penalty term that measures the magnitude of the coefficients. Large coefficients are penalized, but the overall results can be more reliable.

Different forms of the penalty term lead to different methods. If the penalty term is quadratic, then the method is called ridge regression. If the penalty term is a sum of the absolute values of the coefficients, it is called LASSO (Least Absolute Shrinkage and Selection Operator). If it is a combination of the two, it is called elastic net. Each method has its advantages and disadvantages.

In general, standard errors and confidence intervals for the regression coefficients are not available for regularized regression. The reason is that the regularization introduces a bias towards smaller values, which makes the true variance of the coefficient hard to determine.

Computing regularized regression

Regularization is implemented in two ways. Ridge regression is implemented as an option to the LinearRegressionModel class. LASSO and elastic net are implemented using a separate class, RegularizedRegressionModel.

Ridge regression

Ridge regression can be computed like ordinary linear regression by setting the RidgeParameter property to a strictly positive value. The value of the parameter is used as the coefficient of the quadratic term that is added to the sum of the squared residuals. By default, the predictors are standardized to have zero mean and unit standard deviation. The ridge parameter's size should therefore be compared to unity, not to the scale of the predictors.

LASSO

LASSO and elastic net are implemented using the RegularizedRegressionModel. The RegularizationParameter should be set to the coefficient of the penalty term. A second parameter, RegularizationRatio determines the relative importance of the linear and quadratic penalty terms. For LASSO, it should be set to or kept at its default value of one. The value of the regularization parameter is used as the coefficient of the sum of absolute values of the coefficients that is added to the sum of the squared residuals.

Elastic net

Elastic net is a generalization of both ridge regression and the LASSO which includes both a linear and a quadratic term in the penalty. Once again, the RegularizationParameter property should be set a strictly positive value. The RegularizationRatio should be set to a value between 0 and 1. It specifies the fraction of the penalty term that is linear. A value of 0.4 means that the linear term (sum of the absolute values of the coefficients) has a coefficient of 0.4 times RegularizationParameter, while the quadratic term will have a coefficient of 0.6 times RegularizationParameter.

Regularization paths

For LASSO and elastic net, it is possible to obtain the regularization path. The regularization path shows how the values of the regression coefficients change as the regularization parameter changes. Above a certain value, all regression coefficients will be zero. The regularization path is only interesting up to this value.

The regularization path can be obtained in two steps: once the regularization ratio is set to its desired value, a call to GetRegularizationPathParameters returns a vector of suitable regularization parameters. This method takes two arguments: the number of points and the ratio between the smallest and the largest value. The largest value is chosen automatically to approximate the smallest value that produces all zero coefficients.

A call to GetRegularizationPathParameters will then compute the regularization path. It takes the vector returned from GetRegularizationPathParameters as its first argument and returns a matrix whose rows contain the regression coefficients computed for the corresponding value of the regularization parameter.

Copyright (c) 2004-2021 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2021, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.