Generalized linear models are an extension of linear regression models
to situations where the distribution of the dependent variable is not normal.
The types of models that can be represented as generalized linear models include:
classic linear regression, logistic regression, probit regression
and Poisson regression.
Two properties define the nature of a specific generalized linear model.
The ModelFamily
specifies the distribution of the errors. The
LinkFunction
defines the relationship between the dependent variable and the linear combination
of predictor variables.
Constructing GeneralizedLinear Models
The GeneralizedLinearModel class has four constructors.
The first constructor takes two parameters. The first is a NumericalVariable that represents the
dependent variable. The second is an array of NumericalVariable objects that represent the independent
variables.
| C# | Copy |
|---|
NumericalVariable dependent = new NumericalVariable("y", yData);
NumericalVariable independent1 = new NumericalVariable("x1", x1Data);
NumericalVariable independent2 = new NumericalVariable("x2", x2Data);
GeneralizedLinearModel model1 = new GeneralizedLinearModel(dependent, independent);
|
| Visual Basic | Copy |
|---|
Dim dependent As NumericalVariable = New NumericalVariable("y", yData)
Dim independent1 As NumericalVariable = New NumericalVariable("x1", xData)
Dim independent2 As NumericalVariable = New NumericalVariable("x2", xData)
Dim model1 As GeneralizedLinearModel = _
New GeneralizedLinearModel(dependent, independent)
|
The second constructor takes 3 parameters. The first parameter is a VariableCollection object that
contains the variables to be used in the regression. The second parameter is a string containing the name of the
dependent variable. The third parameter is an array of strings containing the names of the independent variables. All
the names must exist in the collection specified by the first parameter. All variables must be of type
NumericalVariable.
| C# | Copy |
|---|
VariableCollection variables = new VariableCollection();
variables.Add(dependent);
variables.Add(independent1);
variables.Add(independent2);
GeneralizedLinearModel model2 = new GeneralizedLinearModel(variables, "y", new string() {"x1", "x2"});
|
| Visual Basic | Copy |
|---|
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent1)
variables.Add(independent2)
Dim model2 As GeneralizedLinearModel = _
New GeneralizedLinearModel(variables, "y", New String() {"x1", "x2"})
|
The third constructor also takes 3 parameters. The first parameter is a DataTable object that
contains the data for the regression analysis. The second parameter is a string containing the name of the column
that contains the data for the dependent variable. The third parameter is a string containing the name of the column
that contains the data for the independent variable. Both columns must be numerical or convertible to numerical
values.
| C# | Copy |
|---|
DataCollection table = new DataTable();
GeneralizedLinearModel model3 = new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
|
| Visual Basic | Copy |
|---|
Dim table As DataTable = New DataTable()
Dim model3 As GeneralizedLinearModel = _
New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})
|
The fourth constructor takes two arguments. The first is a Vector containing the data of the dependent variable. The second is
a Matrix whose columns contain the data for each
independent variable. The length of the vector must equal the number of rows of the matrix.
Model Families
The model family specifies the distribution of the errors in the
dependent variable. The model family of a generalized linear model
can be accessed through the
ModelFamily
property. It is of type
ModelFamily.
All common model families can accessed as static (Shared
in Visual Basic)
member on this type:
| Member |
Description |
|
Normal
|
The normal distribution. This is the default. |
|
Binomial
|
The binomial distribution. |
|
Gamma
|
The gamma distribution. |
|
InverseGaussian
|
The inverse Gaussian or inverse normal distribution. |
|
Poisson
|
The Poisson distribution. |
Link Functions
The link function specifies the relationship between the
dependent variable and the linear combination of predictor variables.
The link function of a generalized linear model
can be accessed through the
LinkFunction
property. It is of type
LinkFunction.
The link function and the model family together determine the exact form
of the distribution of the dependent variable. Not all link functions are
compatible with a given model family. To check for compatibility, use the
model family's
IsLinkFunctionCompatible(LinkFunction)
method.
Every model family has a canonical link function, which can be thought of
as the natural choice of link function for the family of distributions.
When no link function is specified, the canonical link function of the model family is used.
The canonical link function of a model family is available through the
CanonicalLinkFunction
property.
All common link functions can accessed using static (Shared in Visual Basic)
members of the LinkFunction class:
| Member |
Description |
|
Identity
|
The identity function. This is the canonical link function for the normal family. |
|
Log
|
The log link is the canonical link function for the Poisson family and the negative binomial famliy. |
|
Logit
|
The logit link is the canonical link function for the binomial family. |
|
Probit
|
The probit function is often used in logistic regression. |
|
ComplementaryLogLog
|
The complementary log-log link is used in logistic regression and is related to the extreme value distribution. |
|
LogComplement
|
The log complement link function is sometimes used in logistic regression. |
|
NegativeLogLog
|
The negative log log link function is sometimes used in logistic regression. |
|
Reciprocal
|
The reciprocal link function is the canonical link function for the gamma family. |
|
ReciprocalSquared
|
The squared reciprocal link function is the canonical link function for the inverse Gaussian family. |
|
Power(Double)
|
The power link function for a specified exponent. This is a generalization of several other link
functions, like the Identity,
Reciprocal, and
ReciprocalSquared link functions. |
|
OddsPower(Double)
|
The odds power link function for a specified exponent. If the exponent is zero,
this function is equivalent to the
Logit link function. |
Computing the Regression
Before the model can be computed, the model family and link function have to be set.
The following example creates a probit regression model:
| C# | Copy |
|---|
DataCollection table = new DataTable();
GeneralizedLinearModel model4 = new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
model4.ModelFamily = ModelFamily.Binomial;
model4.LinkFunction = LinkFunction.Probit;
|
| Visual Basic | Copy |
|---|
Dim table As DataTable = New DataTable()
Dim model4 As GeneralizedLinearModel = _
New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})
model4.ModelFamily = ModelFamily.Binomial
model4.LinkFunction = LinkFunction.Probit
|
When the link function is the canonical link function of the model family, it
does not have to be set explicitly. The example below creates a Poisson regression model
with a log link, which is the canonical link:
| C# | Copy |
|---|
DataCollection table = new DataTable();
GeneralizedLinearModel model5 = new GeneralizedLinearModel(table, "y", new string() {"x1", "x2"});
model5.ModelFamily = ModelFamily.Poisson;
|
| Visual Basic | Copy |
|---|
Dim table As DataTable = New DataTable()
Dim model5 As GeneralizedLinearModel = _
New GeneralizedLinearModel(table, "y", New String() {"x1", "x2"})
model5.ModelFamily = ModelFamily.Binomial
model5.LinkFunction = LinkFunction.Probit
|
Once the model family and link function have been set, the model can be computed.
The Compute()()() method performs the actual analysis.
Most properties and methods throw an exception when they are accessed before the Compute method is
called. You can verify that the model has been calculated by inspecting the Computed property.
| C# | Copy |
|---|
model1.Compute();
|
| Visual Basic | Copy |
|---|
model1.Compute()
|
The PredictedValues()()() property
returns a Vector that contains the values of the dependent
variable as predicted by the model. The Residuals()()() property returns a vector containing the
difference between the actual and the predicted values of the dependent variable. Both vectors contain one element
for each observation.
Regression Parameters
The GeneralizedLinearModel class' Parameters property returns a ParameterCollection object that contains the parameters of the
regression model. The members of this collection are of type Parameter. Regression parameters are created by the model. You cannot create
them directly.
Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the
variable associated with it.
A generalized linear model has as many parameters as there are independent variables, plus one for the intercept
(constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index
0. The name of the intercept parameter can be retrieved or set through the InterceptParameterName()()() property.
The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the
StandardError property returns the standard deviation
of the parameter's distribution.
The Statistic property returns the value of the
z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue()()() property returns the corresponding p-value. A high p-value
indicates that the variable associated with the parameter does not make a significant contribution to explaining the
data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope
parameter of our earlier example:
| C# | Copy |
|---|
Parameter x1Parameter = model1.Parameters["x1"];
Console.WriteLine("Name: {0}", x1Parameter.Name);
Console.WriteLine("Value: {0}", x1Parameter.Value);
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError);
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value: {0}", x1Parameter.PValue);
|
| Visual Basic | Copy |
|---|
Dim x1Parameter As = model1.Parameters("x1")
Console.WriteLine("Name: {0}", x1Parameter.Name)
Console.WriteLine("Value: {0}", x1Parameter.Value)
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError)
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value: {0}", x1Parameter.PValue)
|
The Parameter class has one method: GetConfidenceInterval(Double). This method takes one parameter:
a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the
confidence interval for the parameter at the specified confidence level as an Interval structure.
Verifying the Quality of the Regression
Generalized linear models are fitted by maximizing the likelihood function.
The logarithm of the likelihood function of the final result is available through the
GetLogLikelihood()()()
method. A related method,
GetKernelLogLikelihood()()(),
returns the part of the log likelihood that depends on the dependent variable.
The GetChiSquare()()()
method compares the log likelihood of the model to the log likelihood of the minimal model.
Other measures for goodness of fit are, suitable for comparing different models of the
same data are: the Akaike Information Criterion or AIC
( GetAkaikeInformationCriterion()()()),
the corrected AIC
( GetCorrectedAkaikeInformationCriterion()()()),
and the Bayesian Information Criterion or BIC
( GetBayesianInformationCriterion()()()).