Extreme Optimization >
User's Guide >
Statistics Library >
Analysis of Variance >
ANOVA Models
Extreme Optimization User's Guide
User's Guide
Up: Analysis of Variance Next: One-way ANOVA Previous: Analysis of Variance Contents
ANOVA Models
The label "analysis of variance" (ANOVA) brings together a series of techniques to determine and measure the
source of the variation in data. Specifically, ANOVA procedures partition the total variation in a data set into its
component parts.
ANOVA models come in many shapes and sizes, called designs. The Extreme Optimization Numerical
Libraries for .NET supports the three most common designs: one-way, one-way with repeated measures, and two-way
analysis of variance. However, the infrastructure is in place to handle designs of any size and complexity.
Defining ANOVA models
All classes that implement ANOVA models inherit from a common base class, AnovaModel, which in turn inherits from GeneralLinearModel, the base class of all statistical model
classes.
In regression models, the dependent variable is a linear function of the independent variables. In an ANOVA
design, the independent variables are categorical. The contribution of each individual combination of values of the
independent variables must be estimated separately. Some dependencies exist, so the actual number of parameters is
smaller than the number of combinations. Depending on the design, some combinations may be excluded from the model,
further decreasing the number of parameters.
The set of all possible values of a categorical variable is called a factor. The possible values are
called the levels of the factor. The purpose of an ANOVA analysis is to investigate the contribution of each
level of each factor, and/or combinations thereof to the total variation of the data.
So even though the model is initially defined in terms of the dependent and independent variables, the actual
calculations are performed using the factors rather than the independent variables they are associated with.
The GetFactor method of the
AnovaModel class returns the Factor object at the specified index. An overload allows you to
retrieve the factor associated with an independent variable through the variable's name.
Cells and Cell Arrays
The first step in performing an analysis of variance is to divide the data set into groups of rows with the same
values for the factors. The data that is associated with a particular combination of factor levels is called a
cell.
Cells are implemented by the Cell class. This class has a number of
properties that return summary statistics for the data in the cell. The most important ones are: Count, which returns the number of observations in the cell, Mean which returns the cell mean, and Variance which returns the variance of the data in the cell only.
Cell objects can't be created directly. Instead, they are accessed through the model's cell array. Each ANOVA
model has an associated CellArray, a multi-dimensional array of
Cell objects. This array is accessible through the Cells property of the AnovaModel object. The cell array has as many dimensions as there are factors
in the model.
To access a specific cell, use the factor levels as indices. Using the special index Cell.All for a factor level indicates that the cell contains the totals for all
levels of the factor. Setting all indices to Cell.All indicates that the cell represents summary data
for the entire data set.
A CellArray has a number of useful properties. The IsBalanced property indicates whether all the cells in the model
have the same number of observations. Most ANOVA models in the Extreme Optimization Numerical Libraries for
.NET require that the data be balanced in this way. The ObservationsPerCell property returns the number of
observations in each cell for a balanced design. If the design is unbalanced, the value -1 is returned. Finally, the
Length property returns the total number of cells in the
array.
Results of the Analysis
The results of an analysis of variance are in the same format as those of other statistical models.
The AnovaTable property returns the
AnovaTable object that summarizes the results. The number of
rows in the table varies with the details of the design. The TotalRow property always returns the AnovaRow for the complete data. The ErrorRow property returns the row for the residuals. The
CompleteModelRow property returns the row for
all the factors or interactions in the model combined. Rows corresponding to the individual factors and
interactions in the model can be retrieved through the GetModelRow method.
Up: Analysis of Variance Next: One-way ANOVA Previous: Analysis of Variance Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation