Extreme Optimization™: Complexity made simple.

Numerical Components
for .NET

  • Home
  • •
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • •
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • •
  • Support
    • Frequently Asked Questions
    • QuickStart Samples
    • Sample Applications
    • Downloads
  • •
  • Blog
  • •
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Contact us
Introduction
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand ReferenceReference
  • Home
  • Documentation
  • Statistics Library User's Guide
  • Regression Analysis
    • Simple Linear Regression
    • Multiple Linear Regression
    • Polynomial Regression
    • Nonlinear Regression
    • Logistic Regression
    • Generalized Linear Models
Collapse imageExpand ImageCopy imageCopyHover image
       




Generalized Linear Models

Generalized linear models are an extension of linear regression models to situations where the distribution of the dependent variable is not normal. The types of models that can be represented as generalized linear models include: classic linear regression, logistic regression, probit regression and Poisson regression.

Two properties define the nature of a specific generalized linear model. The ModelFamily specifies the distribution of the errors. The LinkFunction defines the relationship between the dependent variable and the linear combination of predictor variables.

Constructing GeneralizedLinear Models

The GeneralizedLinearModel class has four constructors.

The first constructor takes two parameters. The first is a NumericalVariable that represents the dependent variable. The second is an array of NumericalVariable objects that represent the independent variables.

C# Copy imageCopy
NumericalVariable dependent = new NumericalVariable("y", yData);
NumericalVariable independent1 = new NumericalVariable("x1", x1Data);
NumericalVariable independent2 = new NumericalVariable("x2", x2Data);
GeneralizedLinearModel model1 =    new GeneralizedLinearModel(dependent, independent);
Visual Basic Copy imageCopy
Dim dependent As NumericalVariable = New NumericalVariable("y", yData)
Dim independent1 As NumericalVariable = New NumericalVariable("x1", xData)
Dim independent2 As NumericalVariable = New NumericalVariable("x2", xData)
Dim model1 As GeneralizedLinearModel = _
    New GeneralizedLinearModel(dependent, independent)

The second constructor takes 3 parameters. The first parameter is a VariableCollection object that contains the variables to be used in the regression. The second parameter is a string containing the name of the dependent variable. The third parameter is an array of strings containing the names of the independent variables. All the names must exist in the collection specified by the first parameter. All variables must be of type NumericalVariable.

C# Copy imageCopy
VariableCollection variables = new VariableCollection();
variables.Add(dependent);
variables.Add(independent1);
variables.Add(independent2);
GeneralizedLinearModel model2 =    new GeneralizedLinearModel(variables, "y", new string() {"x1", "x2"});
Visual Basic Copy imageCopy
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent1)
variables.Add(independent2)
Dim model2 As GeneralizedLinearModel = _
    New GeneralizedLinearModel(variables, "y", New String() {"x1", "x2"})

The third constructor also takes 3 parameters. The first parameter is a DataTable object that contains the data for the regression analysis. The second parameter is a string containing the name of the column that contains the data for the dependent variable. The third parameter is a string containing the name of the column that contains the data for the independent variable. Both columns must be numerical or convertible to numerical values.

C# Copy imageCopy
DataCollection table = new DataTable();
// Fill data table with data from some datasource.
GeneralizedLinearModel model3 =    new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
Visual Basic Copy imageCopy
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model3 As GeneralizedLinearModel = _
    New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})

The fourth constructor takes two arguments. The first is a Vector containing the data of the dependent variable. The second is a Matrix whose columns contain the data for each independent variable. The length of the vector must equal the number of rows of the matrix.

Model Families

The model family specifies the distribution of the errors in the dependent variable. The model family of a generalized linear model can be accessed through the ModelFamily property. It is of type ModelFamily. All common model families can accessed as static (Shared in Visual Basic) member on this type:

Member Description
Normal The normal distribution. This is the default.
Binomial The binomial distribution.
Gamma The gamma distribution.
InverseGaussian The inverse Gaussian or inverse normal distribution.
Poisson The Poisson distribution.

Link Functions

The link function specifies the relationship between the dependent variable and the linear combination of predictor variables. The link function of a generalized linear model can be accessed through the LinkFunction property. It is of type LinkFunction.

The link function and the model family together determine the exact form of the distribution of the dependent variable. Not all link functions are compatible with a given model family. To check for compatibility, use the model family's IsLinkFunctionCompatible(LinkFunction) method.

Every model family has a canonical link function, which can be thought of as the natural choice of link function for the family of distributions. When no link function is specified, the canonical link function of the model family is used. The canonical link function of a model family is available through the CanonicalLinkFunction property.

All common link functions can accessed using static (Shared in Visual Basic) members of the LinkFunction class:

Member Description
Identity The identity function. This is the canonical link function for the normal family.
Log The log link is the canonical link function for the Poisson family and the negative binomial famliy.
Logit The logit link is the canonical link function for the binomial family.
Probit The probit function is often used in logistic regression.
ComplementaryLogLog The complementary log-log link is used in logistic regression and is related to the extreme value distribution.
LogComplement The log complement link function is sometimes used in logistic regression.
NegativeLogLog The negative log log link function is sometimes used in logistic regression.
Reciprocal The reciprocal link function is the canonical link function for the gamma family.
ReciprocalSquared The squared reciprocal link function is the canonical link function for the inverse Gaussian family.
Power(Double) The power link function for a specified exponent. This is a generalization of several other link functions, like the Identity, Reciprocal, and ReciprocalSquared link functions.
OddsPower(Double) The odds power link function for a specified exponent. If the exponent is zero, this function is equivalent to the Logit link function.

Computing the Regression

Before the model can be computed, the model family and link function have to be set. The following example creates a probit regression model:

C# Copy imageCopy
DataCollection table = new DataTable();
// Fill data table with data from some datasource.
GeneralizedLinearModel model4 =    new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
model4.ModelFamily = ModelFamily.Binomial;
model4.LinkFunction = LinkFunction.Probit;
Visual Basic Copy imageCopy
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model4 As GeneralizedLinearModel = _
    New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})
model4.ModelFamily = ModelFamily.Binomial
model4.LinkFunction = LinkFunction.Probit

When the link function is the canonical link function of the model family, it does not have to be set explicitly. The example below creates a Poisson regression model with a log link, which is the canonical link:

C# Copy imageCopy
DataCollection table = new DataTable();
// Fill data table with data from some datasource.
GeneralizedLinearModel model5 =    new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
model5.ModelFamily = ModelFamily.Poisson;
// Log is the canonical link function for the Poisson family, 
// so we don't need to set the link function explicitly.
// model5.LinkFunction = LinkFunction.Log;
Visual Basic Copy imageCopy
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model5 As GeneralizedLinearModel = _
    New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})
model5.ModelFamily = ModelFamily.Binomial
model5.LinkFunction = LinkFunction.Probit
' Log is the canonical link function for the Poisson family, 
' so we don't need to set the link function explicitly.
' model5.LinkFunction = LinkFunction.Log

Once the model family and link function have been set, the model can be computed. The Compute()()() method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

C# Copy imageCopy
model1.Compute();
Visual Basic Copy imageCopy
model1.Compute()

The PredictedValues()()() property returns a Vector that contains the values of the dependent variable as predicted by the model. The Residuals()()() property returns a vector containing the difference between the actual and the predicted values of the dependent variable. Both vectors contain one element for each observation.

Regression Parameters

The GeneralizedLinearModel class' Parameters property returns a ParameterCollection object that contains the parameters of the regression model. The members of this collection are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A generalized linear model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0. The name of the intercept parameter can be retrieved or set through the InterceptParameterName()()() property.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue()()() property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope parameter of our earlier example:

C# Copy imageCopy
Parameter x1Parameter = model1.Parameters["x1"];
Console.WriteLine("Name:        {0}", x1Parameter.Name);
Console.WriteLine("Value:       {0}", x1Parameter.Value);
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError);
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value:     {0}", x1Parameter.PValue);
Visual Basic Copy imageCopy
Dim x1Parameter As = model1.Parameters("x1")
Console.WriteLine("Name:        {0}", x1Parameter.Name)
Console.WriteLine("Value:       {0}", x1Parameter.Value)
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError)
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value:     {0}", x1Parameter.PValue)

The Parameter class has one method: GetConfidenceInterval(Double). This method takes one parameter: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

Verifying the Quality of the Regression

Generalized linear models are fitted by maximizing the likelihood function. The logarithm of the likelihood function of the final result is available through the GetLogLikelihood()()() method. A related method, GetKernelLogLikelihood()()(), returns the part of the log likelihood that depends on the dependent variable. The GetChiSquare()()() method compares the log likelihood of the model to the log likelihood of the minimal model.

Other measures for goodness of fit are, suitable for comparing different models of the same data are: the Akaike Information Criterion or AIC ( GetAkaikeInformationCriterion()()()), the corrected AIC ( GetCorrectedAkaikeInformationCriterion()()()), and the Bayesian Information Criterion or BIC ( GetBayesianInformationCriterion()()()).

Send comments on this topic to support@extremeoptimization.com

Copyright © 2003-2010, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.