Extreme Optimization >
User's Guide >
Statistics Library >
General Linear Models
Extreme Optimization User's Guide
User's Guide
Up: Statistics Library Next: Regression Analysis Previous: Sorting and Filtering Contents
General Linear Models
The General Linear Model is a generalization of linear
regression models that unifies of regression models and analysis of
variance (ANOVA) models. This approach allows for a consistent
interface to these models. The Extreme Optimization Statistics
Library for .NET uses a general linear model as the basis for
its implementation of linear statistical models. This unified
approach offers great benefits in terms of the types of models that
can be built. The traditional ways of defining regression and
analysis of variance models are still available.
This chapter offers a very brief overview of General Linear
Models. The next two chapters will cover two specific types of
models: linear regression models and analysis of variance (ANOVA)
models.
Definition of a General Linear Model
A General Linear Model is defined by
Y = β0 +
β1X1 + β2X2 +
... + βnXn + e
where Y is a vector that represents the dependent
variable, the Xi are vectors representing the
independent variables, e is a vector of residuals, and the
βi are the regression parameters. The
residuals are assumed to follow a normal distribution. To define a
general linear model, you therefore need to define one dependent
variable and one or more independent variables.
In a linear regression model, all variables are numerical
variables. They may be transformations of the same variable. For
example, in polynomial regression, the independent variables are
powers of a single variable.
In an ANOVA model, categorical variables are replaced by one or
more indicator variables, one each for all but one of every
possible value of the variable. This transformation is made
transparently when you define an ANOVA model.
Mixed models, that use numerical and categorical variables as
independent variables, are also possible. These are sometimes
referred to as ANCOVA models. ANCOVA stands for ANalysis of
COVAriance.
The GeneralLinearModel class
All classes that implement regression and ANOVA models inherit
from a common base class: GeneralLinearModel. This
class defines properties and methods common to all linear model
classes.
The DependentVariable
property represents the dependent variable of the model. This
variable must be of type NumericalVariable.
The IndependentVariables
property represents the collection of independent variables in the
model. It is of type NumericalVariable.
The variables can be numerical or categorical. The specific type of
model may put restrictions on the kinds of variables that are
allowed.
Results of the regression
The Compute
method calculates the regression parameters as well as a number of
global properties of the model.
The DegreesOfFreedom
property returns the total degrees of freedom in the regression
model.
The ResidualSumOfSquares
property, StandardError,
RSquared
and AdjustedRSquared
properties return the values indicated by their name.
Regression Parameters
The Parameters
property returns a ParameterCollection
object that contains the parameters of a regression model.
Regression parameters are implemented by the Parameter class.
Regression parameters can't be constructed directly. They are
created by the model.
Every parameter is associated with a variable in the model. This
variable is returned by the Variable property.
The Name property
gives a descriptive name, which usually equals the name of the
corresponding variable.
The Value property
returns the numerical value of the parameter, while the
StandardError
property returns the standard deviation of the parameter's
distribution.
The Statistic
property returns the value of the t-statistic corresponding to the
hypothesis that the parameter equals zero. The PValue
property returns the corresponding p-value. The p-value always
corresponds to a two-tailed test.
The Parameter class has one method:
GetConfidenceInterval. This method takes one
parameter: a confidence level between 0 and 1. A value of 0.95
corresponds to a confidence level of 95%. The method returns the
confidence interval for the parameter at the specified confidence
level as an Interval structure.
The ANOVA Table
The AnovaTable
property returns the AnovaTable object
that summarizes the results of the model. The ANOVA table breaks
down the variance in the data into two components: the variance
explained by the model, and the variance in the residuals. The more
of the variance is explained by the model, the greater the
significance of the model.
The number of rows depends on the specific design of the model.
Three rows are always available:
The TotalRow and ErrorRow properties
return an object of type AnovaRow.
It has a Table property
that returns the AnovaTable object of which it is a
part. The RowType
property returns an AnovaRowType value
that specifies whether the data in the row refers to the complete
data (AnovaRowType.Total), the residuals
(AnovaRowType.Error) or the model or a component of
the model (AnovaRowType.Model). The
Caption property returns a descriptive name of the
row. This name reflects the terminology used in each kind of
analysis.
The AnovaRow class defines a number of properties
that describe the contribution of the component to the variation in
the data. The DegreesOfFreedom
property returns the degrees of freedom in the component.
SumOfSquares
gets the sum of squares of the specified component, while
MeanSquare
returns the sum of squares divided by the degrees of freedom.
Rows that correspond to the model or one of its components are
of type AnovaModelRow.
This type derives from AnovaRow, but has two
additional properties. The FStatistic
property returns the value of the F statistic that compares the
variance of the model component to the variance of the residual.
The PValue
property returns the corresponding p-value.
The AnovaTable class has a ToDataTable
method, which returns the information in the ANOVA table in the
form of a DataTable. This DataTable can
be bound directly to a data source such as a
DataGrid.
General Linear Model Implementations
The GeneralLinearModel class is an abstract base
class and cannot be instantiated directly. The following table list
the classes that inherit from GeneralLinearModel:
The next two chapters discuss these models in greater
detail.
Up: Statistics Library Next: Regression Analysis Previous: Sorting and Filtering Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation