Extreme Optimization™: Complexity made simple.

Numerical Components
for .NET

  • Home
  • •
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • •
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • •
  • Support
    • Frequently Asked Questions
    • QuickStart Samples
    • Sample Applications
    • Downloads
  • •
  • Blog
  • •
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Contact us
Introduction
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand ReferenceReference
  • Home
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Statistics Library User's Guide
    • Statistical Variables
    • Continuous Variables
    • Categorical Variables
    • Variable Collections
    • General Linear Models
    • Regression Analysis
    • Analysis of Variance
    • Time Series Analysis
    • Multivariate Analysis
    • Continuous Distributions
    • Discrete Distributions
    • Multivariate Distributions
    • Hypothesis Tests
    • Histograms
    • Random Numbers
    • Appendices
  • Regression Analysis
    • Simple Linear Regression
    • Multiple Linear Regression
    • Polynomial Regression
    • Nonlinear Regression
    • Logistic Regression
    • Generalized Linear Models
  • Simple Linear Regression
Collapse imageExpand ImageCopy imageCopyHover image
       




Simple Linear Regression

Simple linear regression is a technique to analyze a linear relationship between two variables. The Extreme Optimization Numerical Libraries for .NET supports simple linear regression through the SimpleRegressionModel class.

Constructing Simple Regression Models

The SimpleRegressionModel class has five constructors.

The first constructor takes two arrays of double values. The first array contains the data for the dependent variable. The second array contains the data for the independent variable. The dependent variable is named 'Y,' while the independent variable is named 'X.'

C# Copy imageCopy
double[] xData = {0.2, 337.4, 118.2, 884.6, 10.1, 
    226.5, 666.3, 996.3, 448.6, 777.0, 558.2, 0.4, 0.6, 
    775.5, 666.9, 338.0, 447.5, 11.6, 556.0, 228.1, 995.8, 
    887.6, 120.2, 0.3, 0.3, 556.8, 339.1, 887.2, 999.0, 
    779.0, 11.1, 118.3, 229.2, 669.1, 448.9, 0.5};
double[] yData = {0.1, 338.8, 118.1, 888.0, 9.2,
    228.1, 668.5, 998.5, 449.1, 778.9, 559.2, 0.3, 0.1, 
    778.1, 668.8, 339.3, 448.9, 10.8, 557.7, 228.3, 998.0, 
    888.8, 119.6, 0.3, 0.6, 557.6, 339.3, 888.0, 998.5, 
    778.9,  10.2 , 117.6, 228.9, 668.4, 449.2,   0.2};
SimpleRegressionModel model1 =    new SimpleRegressionModel(yData, xData);
Visual Basic Copy imageCopy
Dim xData As Double() = {0.2, 337.4, 118.2, 884.6, 10.1, _
    226.5, 666.3, 996.3, 448.6, 777.0, 558.2, 0.4, 0.6, _
    775.5, 666.9, 338.0, 447.5, 11.6, 556.0, 228.1, 995.8, _
    887.6, 120.2, 0.3, 0.3, 556.8, 339.1, 887.2, 999.0, _
    779.0, 11.1, 118.3, 229.2, 669.1, 448.9, 0.5}
Dim yData As Double() = {0.1, 338.8, 118.1, 888.0, 9.2, _
    228.1, 668.5, 998.5, 449.1, 778.9, 559.2, 0.3, 0.1, _
    778.1, 668.8, 339.3, 448.9, 10.8, 557.7, 228.3, 998.0, _
    888.8, 119.6, 0.3, 0.6, 557.6, 339.3, 888.0, 998.5, _
    778.9, 10.2, 117.6, 228.9, 668.4, 449.2, 0.2}
Dim model1 As SimpleRegressionModel = _
    New SimpleRegressionModel(yData, xData)

The second constructor takes two Vector objects. Once again, the first vector contains the data for the dependent variable. The second vector contains the data for the independent variable. The dependent variable is named 'Y,' while the independent variable is named 'X.'

C# Copy imageCopy
Vector dependentVector = Vector.Create(yData);
Vector independentVector = Vector.Create(xData);
SimpleRegressionModel model2 =    new SimpleRegressionModel(dependentVector, independentVector);
Visual Basic Copy imageCopy
Dim dependentVector As Vector = Vector.Create(yData)
Dim independentVector As Vector = Vector.Create(xData)
Dim model2 As SimpleRegressionModel = _
    New SimpleRegressionModel(dependentVector, independentVector)

The third constructor takes two NumericalVariable objects as its only two parameters. The first variable represents the dependent variable. The second variable represents the independent variable.

C# Copy imageCopy
NumericalVariable dependent = new NumericalVariable("y", yData);
NumericalVariable independent = new NumericalVariable("x", xData);
SimpleRegressionModel model3 =    new SimpleRegressionModel(dependent, independent);
Visual Basic Copy imageCopy
Dim dependent As NumericalVariable = New NumericalVariable("y", yData)
Dim independent As NumericalVariable = New NumericalVariable("x", xData)
Dim model3 As SimpleRegressionModel = _
    New SimpleRegressionModel(dependent, independent)

The fourth constructor takes 3 parameters. The first parameter is a VariableCollection object that contains the variables to be used in the regression. The second parameter is a string containing the name of the dependent variable. The third parameter is a string containing the name of the independent variable. The two names must exist in the collection specified by the first parameter, and the variables must be of type NumericalVariable.

C# Copy imageCopy
VariableCollection variables = new VariableCollection();
variables.Add(dependent);
variables.Add(independent);
SimpleRegressionModel model4 = new SimpleRegressionModel(variables, "y", "x");
Visual Basic Copy imageCopy
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent)
Dim model4 As SimpleRegressionModel = New SimpleRegressionModel(variables, "y", "x")

The fifth constructor also takes 3 parameters. The first parameter is a DataTable object that contains the data for the regression analysis. The second parameter is a string containing the name of the column that contains the data for the dependent variable. The third parameter is a string containing the name of the column that contains the data for the independent variable. Both columns must be numerical or convertible to numerical values.

C# Copy imageCopy
DataCollection table = new DataTable();
// Fill data table with data from some datasource.
SimpleRegressionModel model5 = new SimpleRegressionModel(table, "y", "x");
Visual Basic Copy imageCopy
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model5 As SimpleRegressionModel = New SimpleRegressionModel(table, "y", "x")

Computing the Regression Line

The Compute()()() method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

The GetRegressionLine()()() method returns a Line object that represents the regression line.

C# Copy imageCopy
model1.Compute();
Extreme.Mathematics.Curves.Line regressionLine =    model1.GetRegressionLine();
Visual Basic Copy imageCopy
model1.Compute()
Dim regressionLine As Extreme.Mathematics.Curves.Line = _
    model1.GetRegressionLine()

The PredictedValues property returns a Line that contains the values of the dependent variable as predicted by the model. The Residuals property returns a vector containing the difference between the actual and the predicted values of the dependent variable. Both vectors contain one element for each observation.

Regression Parameters

The SimpleRegressionModel class' Parameters property returns a ParameterCollection object that contains the parameters of the regression model. The members of this collection are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

A simple linear regression model has two parameters. The first, with index 0, is the intercept: the Y-value where the regression line crosses the Y-axis. The second parameter, with index 1, is the slope of the regression line.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope parameter of our earlier example:

C# Copy imageCopy
Parameter slope = model1.Parameters[1];
Console.WriteLine("Name:        {0}", slope.Name);
Console.WriteLine("Value:       {0}", slope.Value);
Console.WriteLine("St.Err.:     {0}", slope.StandardError);
Console.WriteLine("t-statistic: {0}", slope.TStatistic);
Console.WriteLine("p-value:     {0}", slope.PValue);
Visual Basic Copy imageCopy
Dim slope As = model1.Parameters(1)
Console.WriteLine("Name:        {0}", slope.Name)
Console.WriteLine("Value:       {0}", slope.Value)
Console.WriteLine("St.Err.:     {0}", slope.StandardError)
Console.WriteLine("t-statistic: {0}", slope.TStatistic)
Console.WriteLine("p-value:     {0}", slope.PValue)

The Parameter class has one method: GetConfidenceInterval(Double). This method takes one parameter: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

Verifying the Quality of the Regression Line

The ResidualSumOfSquares property gives the sum of the squares of the residuals. The regression line was found by minimizing this value. The StandardError property gives the standard deviation of the data.

The RSquared property returns the coefficient of determination. It is the ratio of the variation in the data that is explained by the model compared to the total variation in the data. Its value is always between 0 and 1, where 0 means the model explains nothing and 1 means the model explains the data perfectly.

When the model contains many independent variables, the additional variables may be modeling the errors in the data rather than the data itself. This causes the full model to be less reliable for making predictions. The AdjustedRSquared property returns an adjusted R2 value that attempts to compensate for this phenomenon.

An entirely different assessment is available through an analysis of variance. Here, the variation in the data is decomposed into a component explained by the model, and the variation in the residuals. The FStatistic property returns the F-statistic for the ratio of these two variances. The PValue property returns the corresponding p-value. A low p-value means that it is unlikely that the variation in the model is the same as the variation in the residuals. This means that the model is significant.

The results of the analysis of variance are also summarized in the regression model's ANOVA table, returned by the AnovaTable property.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2003-2010, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.