Generalized linear models are an extension of linear regression models
to situations where the distribution of the dependent variable is not normal.
The types of models that can be represented as generalized linear models include:
classic linear regression, logistic regression, probit regression
and Poisson regression.
Two properties define the nature of a specific generalized linear model.
The ModelFamily
specifies the distribution of the errors. The
LinkFunction
defines the relationship between the dependent variable and the linear combination
of predictor variables.
Generalized linear models are implemented by the
GeneralizedLinearModel
class.
Constructing Generalized Linear Models
The GeneralizedLinearModel
class has four constructors.
The first constructor takes three arguments. The first is a VectorT
that represents the dependent variable. The second is an array of
vectors that represent the independent variables. The third argument is the model family.
var dependent = Vector.Create(yData);
var independent1 = Vector.Create(x1Data);
var independent2 = Vector.Create(x2Data);
var model1 = new GeneralizedLinearModel(dependent,
new[] { independent1, independent2 }, ModelFamily.Gamma);
Dim dependent = Vector.Create(yData)
Dim independent1 = Vector.Create(x1Data)
Dim independent2 = Vector.Create(x2Data)
Dim model1 = New GeneralizedLinearModel(dependent,
{independent1, independent2}, ModelFamily.Gamma)
No code example is currently available or this language may not be supported.
let dependent = Vector.Create(yData)
let independent1 = Vector.Create(x1Data)
let independent2 = Vector.Create(x2Data)
let columns : Vector<double> array = [| independent1; independent2 |]
let model1a = GeneralizedLinearModel(dependent,
columns, family=ModelFamily.Gamma)
The second constructor takes 4 arguments. The first argument is a
IDataFrame
(a DataFrameR, C or
MatrixT) that
contains the variables to be used in the regression. The second argument is a string
containing the name of the dependent variable. The third argument is an array of strings
containing the names of the independent variables.
All names must exist in the column index of the data frame specified by the first argument.
The third argument is the model family.
The fourth, optional argument specifies the link function. If none is specified,
the canonical link function for the selected model family is used.
The fifth argument, also optional, is a vector containing weights for the observations.
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
{ { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } });
var model2 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" },
ModelFamily.Gamma, LinkFunction.Log);
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
{{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New GeneralizedLinearModel(frame, "y", {"x1", "x2"},
ModelFamily.Gamma, LinkFunction.Log)
No code example is currently available or this language may not be supported.
let columns = Dictionary<string,obj>()
[ "y", dependent ; "x1", independent1 ; "x2", independent2 ] |> Seq.iter columns.Add
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = GeneralizedLinearModel(dataFrame, "y", [| "x1"; "x2" |],
ModelFamily.Gamma, LinkFunction.Log)
The third constructor takes two arguments. The first is a VectorT containing the data of the dependent variable. The second is
a MatrixT whose columns contain the data for each
independent variable. The length of the vector must equal the number of rows of the matrix.
The model family specifies the distribution of the errors in the
dependent variable. The model family of a generalized linear model
can be accessed through the
ModelFamily
property. It is of type
ModelFamily.
All common model families can accessed as static (Shared
in Visual Basic)
member on this type:
Member | Description |
---|
Normal | The normal distribution. This is the default. |
Binomial | The binomial distribution. |
Gamma | The gamma distribution. |
InverseGaussian | The inverse Gaussian or inverse normal distribution. |
Poisson | The Poisson distribution. |
The link function specifies the relationship between the
dependent variable and the linear combination of predictor variables.
The link function of a generalized linear model
can be accessed through the
LinkFunction
property. It is of type
LinkFunction.
The link function and the model family together determine the exact form
of the distribution of the dependent variable. Not all link functions are
compatible with a given model family. To check for compatibility, use the
model family's
IsLinkFunctionCompatible
method.
Every model family has a canonical link function, which can be thought of
as the natural choice of link function for the family of distributions.
When no link function is specified, the canonical link function of the model family is used.
The canonical link function of a model family is available through the
CanonicalLinkFunction
property.
All common link functions can accessed using static (Shared in Visual Basic)
members of the LinkFunction class:
Member | Description |
---|
Identity | The identity function. This is the canonical link function for the normal family. |
Log | The log link is the canonical link function for the Poisson family and the negative binomial famliy. |
Logit | The logit link is the canonical link function for the binomial family. |
Probit | The probit function is often used in logistic regression. |
ComplementaryLogLog | The complementary log-log link is used in logistic regression and is related to the extreme value distribution. |
LogComplement | The log complement link function is sometimes used in logistic regression. |
NegativeLogLog | The negative log log link function is sometimes used in logistic regression. |
Reciprocal | The reciprocal link function is the canonical link function for the gamma family. |
ReciprocalSquared | The squared reciprocal link function is the canonical link function for the inverse Gaussian family. |
Power |
The power link function for a specified exponent. This is a generalization of several other link
functions, like the Identity,
Reciprocal, and
ReciprocalSquared link functions.
|
OddsPower |
The odds power link function for a specified exponent. If the exponent is zero,
this function is equivalent to the
Logit link function.
|
The model family and link function have to be set before the model can be computed.
The following example creates a probit regression model and sets the
model family and link through properties:
var model3 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" },
ModelFamily.Binomial, LinkFunction.Probit);
Dim model3 = New GeneralizedLinearModel(frame, "y", {"x1", "x2"},
ModelFamily.Binomial, LinkFunction.Probit)
No code example is currently available or this language may not be supported.
let model3 = GeneralizedLinearModel(dataFrame, "y", [| "x1"; "x2" |],
ModelFamily.Binomial, LinkFunction.Probit)
When the link function is the canonical link function of the model family, it
does not have to be set explicitly. The example below creates a Poisson regression model
with a log link, which is the canonical link:
var model4 = new GeneralizedLinearModel(dataFrame, "y", new[] { "x1", "x2" },
ModelFamily.Poisson);
Dim model4 = New GeneralizedLinearModel(frame, "y", {"x1", "x2"},
ModelFamily.Poisson)
No code example is currently available or this language may not be supported.
let model4 = GeneralizedLinearModel(dataFrame, "y", [| "x1"; "x2" |],
ModelFamily.Poisson)
Once the model family and link function have been set, the model can be computed.
The Compute method performs the actual analysis.
Most properties and methods throw an exception when they are accessed before the Compute method is
called. You can verify that the model has been calculated by inspecting the Computed property.
No code example is currently available or this language may not be supported.
The Predictions property
returns a VectorT that contains the values of the dependent
variable as predicted by the model. The Residuals property returns a vector containing the
difference between the actual and the predicted values of the dependent variable. Both vectors contain one element
for each observation.
The GeneralizedLinearModel
class' Parameters
property returns a ParameterVectorT
object that contains the parameters of the regression model.
The elements of this vector are of type ParameterT.
Regression parameters are created by the model.
You cannot create them directly.
Parameters can be accessed by numerical index or by name.
The name of a parameter is usually the name of the variable
associated with it.
A generalized linear model has as many parameters as there are independent variables,
plus one for the intercept (constant term) when it is included. The intercept,
if present, is the first parameter in the collection, with index 0.
The ParameterT class
has four useful properties. The Value property returns the numerical value of the parameter, while the
StandardError property returns the standard deviation
of the parameter's distribution.
The Statistic property returns the value of the
z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value
indicates that the variable associated with the parameter does not make a significant contribution to explaining the
data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope
parameter of our earlier example:
var x1Parameter = model1.Parameters.Get("x1");
Console.WriteLine("Name: {0}", x1Parameter.Name);
Console.WriteLine("Value: {0}", x1Parameter.Value);
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError);
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value: {0}", x1Parameter.PValue);
Dim x1Parameter = model1.Parameters.Get("x1")
Console.WriteLine("Name: {0}", x1Parameter.Name)
Console.WriteLine("Value: {0}", x1Parameter.Value)
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value: {0}", x1Parameter.PValue)
No code example is currently available or this language may not be supported.
let x1Parameter = model2.Parameters.Get("x1")
Console.WriteLine("Name: 0}", x1Parameter.Name)
Console.WriteLine("Value: 0}", x1Parameter.Value)
Console.WriteLine("St.Err.: 0}", x1Parameter.StandardError)
Console.WriteLine("t-statistic: 0}", x1Parameter.Statistic)
Console.WriteLine("p-value: 0}", x1Parameter.PValue)
The Parameter class has one method: GetConfidenceInterval. This method takes one argument:
a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the
confidence interval for the parameter at the specified confidence level as an Interval structure.
Verifying the Quality of the Regression
Generalized linear models are fitted by maximizing the likelihood function.
The logarithm of the likelihood function of the final result is available through the
LogLikelihood
method. A related method,
GetKernelLogLikelihood,
returns the part of the log likelihood that depends on the dependent variable.
The GetChiSquare
method compares the log likelihood of the model to the log likelihood of the minimal model.
Other measures for goodness of fit are, suitable for comparing different models of the
same data are: the Akaike Information Criterion or AIC
(GetAkaikeInformationCriterion),
the corrected AIC
(GetCorrectedAkaikeInformationCriterion),
and the Bayesian Information Criterion or BIC
(GetBayesianInformationCriterion).