Statistical models come in two main flavors: univariate and multivariate. Univariate models include ANOVA,
and various forms of regression, including linear, nonlinear, and logistic regression. Multivariate models include
MANOVA, principal component analysis and factor analysis.
This chapter offers a very brief overview of General Linear Models. The next two chapters will cover two specific
types of models: linear regression models and analysis of variance (ANOVA) models.
A univariate model has a single dependent variable and may have one or more independent variables.
Most univariate models can be categorized as a General Linear Model (GLM). A univariate General Linear Model is defined by
Y = β0 + β1X1 + β2X2 + ... +
βnXn + e
where Y is a vector that represents the dependent variable, the Xi are vectors
representing the independent variables, e is a vector of residuals, and the βi are
the regression parameters. The residuals are assumed to follow a normal distribution. To define a general linear
model, you therefore need to define one dependent variable and one or more independent variables.
In a linear regression model, all variables are numerical variables. They may be transformations of the same
variable. For example, in polynomial regression, the independent variables are powers of a single variable.
In an ANOVA model, categorical variables are replaced by one or more indicator variables, one each for all but one
of every possible value of the variable. This transformation is made transparently when you define an ANOVA
model.
Mixed models, that use numerical and categorical variables as independent variables, are also possible. These are
sometimes referred to as ANCOVA models. ANCOVA stands for ANalysis of COVAriance.
In logistic and generalized linear models, the relationship between the dependent variable and independent variables is defined by
a link function. In nonlinear regression, the independent variables may interact in nonlinear ways.
The UnivariateModel class
All classes that implement regression and ANOVA models inherit from a common base class:
UnivariateModel. This class defines properties and methods common to all linear model classes.
The DependentVariable property represents the dependent
variable of the model. This variable must be of type NumericalVariable. The IndependentVariables property represents the collection of independent
variables in the model. It is of type NumericalVariable. The
variables can be numerical or categorical. The specific type of model may put restrictions on the kinds of variables
that are allowed.
Results of the regression
The Compute()()()() method calculates the regression
parameters as well as a number of global properties of the model.
The DegreesOfFreedom property returns
the total degrees of freedom in the regression model.
The ResidualSumOfSquares property,
StandardError, RSquared and AdjustedRSquared properties return the values
indicated by their name.
Regression Parameters
The Parameters property returns a ParameterCollection object that contains the parameters of a
regression model. Regression parameters are implemented by the Parameter class. Regression parameters can't be constructed directly. They are
created by the model.
Every parameter is associated with a variable in the model. This variable is returned by the Variable property. The Name property gives a descriptive name, which usually equals the name of
the corresponding variable.
The Value property returns the numerical value of the
parameter, while the StandardError property returns
the standard deviation of the parameter's distribution.
The Statistic property returns the value of the
t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. The p-value always
corresponds to a two-tailed test.
The Parameter class has one method: GetConfidenceInterval(Double). This method takes one parameter:
a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the
confidence interval for the parameter at the specified confidence level as an Interval structure.
The ANOVA Table
The AnovaTable property returns the
AnovaTable object that summarizes the results of the model. The
ANOVA table breaks down the variance in the data into two components: the variance explained by the model, and the
variance in the residuals. The more of the variance is explained by the model, the greater the significance of the
model.
The number of rows depends on the specific design of the model. Three rows are always available:
- The TotalRow property returns the information for
the complete data.
- The ErrorRow property returns the information for
the residuals or error.
- The CompleteModelRow property returns the
information for the complete model.
The TotalRow and ErrorRow properties return an object of type AnovaRow. It has a Table
property that returns the AnovaTable object of which it is a part. The RowType property returns an AnovaRowType value that specifies whether the data in the row refers to the
complete data (AnovaRowType.Total), the residuals (AnovaRowType.Error) or the model or a
component of the model (AnovaRowType.Model). The Caption property returns a descriptive
name of the row. This name reflects the terminology used in each kind of analysis.
The AnovaRow class defines a number of properties that describe
the contribution of the component to the variation in the data. The DegreesOfFreedom property returns the degrees of freedom in
the component. SumOfSquares gets the sum of squares of
the specified component, while MeanSquare returns the sum
of squares divided by the degrees of freedom.
Rows that correspond to the model or one of its components are of type AnovaModelRow. This type inherits from AnovaRow, but has two additional properties. The FStatistic property returns the value of the F statistic that
compares the variance of the model component to the variance of the residual. The PValue property returns the corresponding p-value.
The AnovaTable class has a ToDataTable()()()() method, which returns the information in the ANOVA
table in the form of a DataTable. This DataTable can be bound directly to a data source
such as a DataGrid.
Univariate Model Implementations
The UnivariateModel class is an abstract base class
and cannot be instantiated directly. The following table list the classes that inherit from
UnivariateModel:
The next four chapters discuss these models in greater detail.
A multivariate model has more than one dependent variable and may have zero or more independent variables.
The meaning of the variables is not always immediately apparent. For example, in Principal Component Analysis (PCA),
all variables in the model are dependent variables, and the independent variables are implicitly defined as the principal
components.
The classes that implement multivariate models all inherit from a common base class:
MultivariateModel. The main contribition of this base class is in the management
of the variables. The following table lists the classes that inherit from MultivariateModel:
The next four chapters discuss these models in greater detail.