Generalized Linear Models | Extreme Optimization Numerical Libraries for .NET Professional |

Generalized linear models are an extension of linear regression models to situations where the distribution of the dependent variable is not normal. The types of models that can be represented as generalized linear models include: classic linear regression, logistic regression, probit regression and Poisson regression.

Two properties define the nature of a specific generalized linear model. The ModelFamily specifies the distribution of the errors. The LinkFunction defines the relationship between the dependent variable and the linear combination of predictor variables.

Generalized linear models are implemented by the GeneralizedLinearModel class.

The GeneralizedLinearModel class has four constructors.

The first constructor takes three arguments. The first is a Vector

var dependent = Vector.Create(yData); var independent1 = Vector.Create(x1Data); var independent2 = Vector.Create(x2Data); var model1 = new GeneralizedLinearModel(dependent, new[] { independent1, independent2 }, ModelFamily.Gamma);

The second constructor takes 4 arguments. The first argument is a
IDataFrame
(a DataFrame

var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>() { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } }); var model2 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" }, ModelFamily.Gamma, LinkFunction.Log);

The third constructor takes two arguments. The first is a Vector

The model family specifies the distribution of the errors in the dependent variable. The model family of a generalized linear model can be accessed through the ModelFamily property. It is of type ModelFamily. All common model families can accessed as static (Shared in Visual Basic) member on this type:

Member | Description |
---|---|

The normal distribution. This is the default. | |

The binomial distribution. | |

The gamma distribution. | |

The inverse Gaussian or inverse normal distribution. | |

The Poisson distribution. |

The link function specifies the relationship between the dependent variable and the linear combination of predictor variables. The link function of a generalized linear model can be accessed through the LinkFunction property. It is of type LinkFunction.

The link function and the model family together determine the exact form of the distribution of the dependent variable. Not all link functions are compatible with a given model family. To check for compatibility, use the model family's IsLinkFunctionCompatible method.

Every model family has a canonical link function, which can be thought of as the natural choice of link function for the family of distributions. When no link function is specified, the canonical link function of the model family is used. The canonical link function of a model family is available through the CanonicalLinkFunction property.

All common link functions can accessed using static (Shared in Visual Basic) members of the LinkFunction class:

Member | Description |
---|---|

The identity function. This is the canonical link function for the normal family. | |

The log link is the canonical link function for the Poisson family and the negative binomial famliy. | |

The logit link is the canonical link function for the binomial family. | |

The probit function is often used in logistic regression. | |

The complementary log-log link is used in logistic regression and is related to the extreme value distribution. | |

The log complement link function is sometimes used in logistic regression. | |

The negative log log link function is sometimes used in logistic regression. | |

The reciprocal link function is the canonical link function for the gamma family. | |

The squared reciprocal link function is the canonical link function for the inverse Gaussian family. | |

The power link function for a specified exponent. This is a generalization of several other link functions, like the Identity, Reciprocal, and ReciprocalSquared link functions. | |

The odds power link function for a specified exponent. If the exponent is zero, this function is equivalent to the Logit link function. |

The model family and link function have to be set before the model can be computed. The following example creates a probit regression model and sets the model family and link through properties:

var model3 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" }, ModelFamily.Binomial, LinkFunction.Probit);

When the link function is the canonical link function of the model family, it does not have to be set explicitly. The example below creates a Poisson regression model with a log link, which is the canonical link:

var model4 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" }, ModelFamily.Poisson);

Once the model family and link function have been set, the model can be computed. The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

The Predictions property
returns a Vector

The GeneralizedLinearModel class' Parameters property returns a ParameterVector object that contains the parameters of the regression model. The elements of this vector are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A generalized linear model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0. The name of the intercept parameter can be retrieved or set through the InterceptParameterName property.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope parameter of our earlier example:

var x1Parameter = model1.Parameters.Get("x1"); Console.WriteLine("Name: {0}", x1Parameter.Name); Console.WriteLine("Value: {0}", x1Parameter.Value); Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError); Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic); Console.WriteLine("p-value: {0}", x1Parameter.PValue);

The Parameter class has one method: GetConfidenceInterval. This method takes one argument: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

Generalized linear models are fitted by maximizing the likelihood function. The logarithm of the likelihood function of the final result is available through the GetLogLikelihood method. A related method, GetKernelLogLikelihood, returns the part of the log likelihood that depends on the dependent variable. The GetChiSquare method compares the log likelihood of the model to the log likelihood of the minimal model.

Other measures for goodness of fit are, suitable for comparing different models of the same data are: the Akaike Information Criterion or AIC (GetAkaikeInformationCriterion), the corrected AIC (GetCorrectedAkaikeInformationCriterion), and the Bayesian Information Criterion or BIC (GetBayesianInformationCriterion).

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.