Extreme Optimization > User's Guide > Statistics Library > General Linear Models

Extreme Optimization User's Guide

User's Guide

Up: Statistics Library Next: Regression Analysis Previous: Sorting and Filtering Contents

General Linear Models

The General Linear Model is a generalization of linear regression models that unifies of regression models and analysis of variance (ANOVA) models. This approach allows for a consistent interface to these models. The Extreme Optimization Statistics Library for .NET uses a general linear model as the basis for its implementation of linear statistical models. This unified approach offers great benefits in terms of the types of models that can be built. The traditional ways of defining regression and analysis of variance models are still available.

This chapter offers a very brief overview of General Linear Models. The next two chapters will cover two specific types of models: linear regression models and analysis of variance (ANOVA) models.

Definition of a General Linear Model

A General Linear Model is defined by

Y = β0 + β1X1 + β2X2 + ... + βnXn + e

where Y is a vector that represents the dependent variable, the Xi are vectors representing the independent variables, e is a vector of residuals, and the βi are the regression parameters. The residuals are assumed to follow a normal distribution. To define a general linear model, you therefore need to define one dependent variable and one or more independent variables.

In a linear regression model, all variables are numerical variables. They may be transformations of the same variable. For example, in polynomial regression, the independent variables are powers of a single variable.

In an ANOVA model, categorical variables are replaced by one or more indicator variables, one each for all but one of every possible value of the variable. This transformation is made transparently when you define an ANOVA model.

Mixed models, that use numerical and categorical variables as independent variables, are also possible. These are sometimes referred to as ANCOVA models. ANCOVA stands for ANalysis of COVAriance.

The GeneralLinearModel class

All classes that implement regression and ANOVA models inherit from a common base class: GeneralLinearModel. This class defines properties and methods common to all linear model classes.

The DependentVariable property represents the dependent variable of the model. This variable must be of type NumericalVariable. The IndependentVariables property represents the collection of independent variables in the model. It is of type NumericalVariable. The variables can be numerical or categorical. The specific type of model may put restrictions on the kinds of variables that are allowed.

Results of the regression

The Compute method calculates the regression parameters as well as a number of global properties of the model.

The DegreesOfFreedom property returns the total degrees of freedom in the regression model.

The ResidualSumOfSquares property, StandardError, RSquared and AdjustedRSquared properties return the values indicated by their name.

Regression Parameters

The Parameters property returns a ParameterCollection object that contains the parameters of a regression model. Regression parameters are implemented by the Parameter class. Regression parameters can't be constructed directly. They are created by the model.

Every parameter is associated with a variable in the model. This variable is returned by the Variable property. The Name property gives a descriptive name, which usually equals the name of the corresponding variable.

The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. The p-value always corresponds to a two-tailed test.

The Parameter class has one method: GetConfidenceInterval. This method takes one parameter: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

The ANOVA Table

The AnovaTable property returns the AnovaTable object that summarizes the results of the model. The ANOVA table breaks down the variance in the data into two components: the variance explained by the model, and the variance in the residuals. The more of the variance is explained by the model, the greater the significance of the model.

The number of rows depends on the specific design of the model. Three rows are always available:

The TotalRow and ErrorRow properties return an object of type AnovaRow. It has a Table property that returns the AnovaTable object of which it is a part. The RowType property returns an AnovaRowType value that specifies whether the data in the row refers to the complete data (AnovaRowType.Total), the residuals (AnovaRowType.Error) or the model or a component of the model (AnovaRowType.Model). The Caption property returns a descriptive name of the row. This name reflects the terminology used in each kind of analysis.

The AnovaRow class defines a number of properties that describe the contribution of the component to the variation in the data. The DegreesOfFreedom property returns the degrees of freedom in the component. SumOfSquares gets the sum of squares of the specified component, while MeanSquare returns the sum of squares divided by the degrees of freedom.

Rows that correspond to the model or one of its components are of type AnovaModelRow. This type derives from AnovaRow, but has two additional properties. The FStatistic property returns the value of the F statistic that compares the variance of the model component to the variance of the residual. The PValue property returns the corresponding p-value.

The AnovaTable class has a ToDataTable method, which returns the information in the ANOVA table in the form of a DataTable. This DataTable can be bound directly to a data source such as a DataGrid.

General Linear Model Implementations

The GeneralLinearModel class is an abstract base class and cannot be instantiated directly. The following table list the classes that inherit from GeneralLinearModel:

Class name Description
LinearRegressionModel Simple and multiple linear regression.
SimpleRegressionModel  Simple regression in one variable.
PolynomialRegressionModel  Polynomial regression in one variable.
NonlinearRegressionModel  Nonlinear regression in one variable.
LogisticRegressionModel  Logistic regression in one or more variables.
AnovaModel  Abstract base class for Analysis of Variance models.
OneWayAnovaModel  One factor Analysis of Variance.
TwoWayAnovaModel  Two factor Analysis of Variance.
OneWayRAnovaModel  One factor Analysis of Variance with repeated measures.

The next two chapters discuss these models in greater detail.

Up: Statistics Library Next: Regression Analysis Previous: Sorting and Filtering Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army