Extreme Optimization™: Complexity made simple.

Numerical Components
for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Contact us
Introduction
Deployment Guide
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand ReferenceReference
  • Home
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Statistics Library User's Guide
    • Statistical Variables
    • Continuous Variables
    • Categorical Variables
    • Variable Collections
    • General Linear Models
    • Regression Analysis
    • Analysis of Variance
    • Time Series Analysis
    • Multivariate Analysis
    • Continuous Distributions
    • Discrete Distributions
    • Multivariate Distributions
    • Hypothesis Tests
    • Histograms
    • Random Numbers
    • Appendices
  • General Linear Models
Collapse image Expand Image Copy image CopyHover image
         




General Linear Models

Statistical models come in two main flavors: univariate and multivariate. Univariate models include ANOVA, and various forms of regression, including linear, nonlinear, and logistic regression. Multivariate models include MANOVA, principal component analysis and factor analysis.

This chapter offers a very brief overview of General Linear Models. The next two chapters will cover two specific types of models: linear regression models and analysis of variance (ANOVA) models.

Univariate Models

A univariate model has a single dependent variable and may have one or more independent variables. Most univariate models can be categorized as a General Linear Model (GLM). A univariate General Linear Model is defined by

Y = β0 + β1X1 + β2X2 + ... + βnXn + e

where Y is a vector that represents the dependent variable, the Xi are vectors representing the independent variables, e is a vector of residuals, and the βi are the regression parameters. The residuals are assumed to follow a normal distribution. To define a general linear model, you therefore need to define one dependent variable and one or more independent variables.

In a linear regression model, all variables are numerical variables. They may be transformations of the same variable. For example, in polynomial regression, the independent variables are powers of a single variable.

In an ANOVA model, categorical variables are replaced by one or more indicator variables, one each for all but one of every possible value of the variable. This transformation is made transparently when you define an ANOVA model.

Mixed models, that use numerical and categorical variables as independent variables, are also possible. These are sometimes referred to as ANCOVA models. ANCOVA stands for ANalysis of COVAriance.

In logistic and generalized linear models, the relationship between the dependent variable and independent variables is defined by a link function. In nonlinear regression, the independent variables may interact in nonlinear ways.

The UnivariateModel class

All classes that implement regression and ANOVA models inherit from a common base class: UnivariateModel. This class defines properties and methods common to all linear model classes.

The DependentVariable property represents the dependent variable of the model. This variable must be of type NumericalVariable. The IndependentVariables property represents the collection of independent variables in the model. It is of type NumericalVariable. The variables can be numerical or categorical. The specific type of model may put restrictions on the kinds of variables that are allowed.

Results of the regression

The Compute()()()() method calculates the regression parameters as well as a number of global properties of the model.

The DegreesOfFreedom property returns the total degrees of freedom in the regression model.

The ResidualSumOfSquares property, StandardError, RSquared and AdjustedRSquared properties return the values indicated by their name.

Regression Parameters

The Parameters property returns a ParameterCollection object that contains the parameters of a regression model. Regression parameters are implemented by the Parameter class. Regression parameters can't be constructed directly. They are created by the model.

Every parameter is associated with a variable in the model. This variable is returned by the Variable property. The Name property gives a descriptive name, which usually equals the name of the corresponding variable.

The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the t-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. The p-value always corresponds to a two-tailed test.

The Parameter class has one method: GetConfidenceInterval(Double). This method takes one parameter: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

The ANOVA Table

The AnovaTable property returns the AnovaTable object that summarizes the results of the model. The ANOVA table breaks down the variance in the data into two components: the variance explained by the model, and the variance in the residuals. The more of the variance is explained by the model, the greater the significance of the model.

The number of rows depends on the specific design of the model. Three rows are always available:

  • The TotalRow property returns the information for the complete data.
  • The ErrorRow property returns the information for the residuals or error.
  • The CompleteModelRow property returns the information for the complete model.

The TotalRow and ErrorRow properties return an object of type AnovaRow. It has a Table property that returns the AnovaTable object of which it is a part. The RowType property returns an AnovaRowType value that specifies whether the data in the row refers to the complete data (AnovaRowType.Total), the residuals (AnovaRowType.Error) or the model or a component of the model (AnovaRowType.Model). The Caption property returns a descriptive name of the row. This name reflects the terminology used in each kind of analysis.

The AnovaRow class defines a number of properties that describe the contribution of the component to the variation in the data. The DegreesOfFreedom property returns the degrees of freedom in the component. SumOfSquares gets the sum of squares of the specified component, while MeanSquare returns the sum of squares divided by the degrees of freedom.

Rows that correspond to the model or one of its components are of type AnovaModelRow. This type inherits from AnovaRow, but has two additional properties. The FStatistic property returns the value of the F statistic that compares the variance of the model component to the variance of the residual. The PValue property returns the corresponding p-value.

The AnovaTable class has a ToDataTable()()()() method, which returns the information in the ANOVA table in the form of a DataTable. This DataTable can be bound directly to a data source such as a DataGrid.

Univariate Model Implementations

The UnivariateModel class is an abstract base class and cannot be instantiated directly. The following table list the classes that inherit from UnivariateModel:

Class name

Description

LinearRegressionModel

Simple and multiple linear regression.

SimpleRegressionModel

Simple regression in one variable.

PolynomialRegressionModel

Polynomial regression in one variable.

NonlinearRegressionModel

Nonlinear regression in one variable.

LogisticRegressionModel

Logistic regression in one or more variables.

AnovaModel

Abstract base class for Analysis of Variance models.

OneWayAnovaModel

One factor Analysis of Variance.

TwoWayAnovaModel

Two factor Analysis of Variance.

OneWayRAnovaModel

One factor Analysis of Variance with repeated measures.

The next four chapters discuss these models in greater detail.

Multivariate Models

A multivariate model has more than one dependent variable and may have zero or more independent variables. The meaning of the variables is not always immediately apparent. For example, in Principal Component Analysis (PCA), all variables in the model are dependent variables, and the independent variables are implicitly defined as the principal components.

The classes that implement multivariate models all inherit from a common base class: MultivariateModel. The main contribition of this base class is in the management of the variables. The following table lists the classes that inherit from MultivariateModel:

Class name

Description

HierarchicalClusterAnalysis

Hiearchical cluster analysis.

KMeansClusterAnalysis

K-means cluster analysis.

PrincipalComponentAnalysis

Principal component analysis (PCA).

The next four chapters discuss these models in greater detail.

Send comments on this topic to support@extremeoptimization.com

Copyright (c) 2004-2011 ExoAnalytics Inc.

Copyright © 2003-2013, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.