Discriminant Analysis | Extreme Optimization Numerical Libraries for .NET Professional |

Linear Discriminant Analysis (LDA) is a technique for multi-class classification and dimensionality reduction. It is based on the assumption that the observations in each class or group are distributed as a multivariate Gaussian distribution, and that all groups have the same covariance matrix. (A variation where groups can have different covariance matrices is called quadratic discriminant analysis (QDA).

When used for dimensionality reduction, the features are projected onto the directions that most separate the classes. When used for classification, it is sufficient to consider the distance to the group centroids in the projected space.

Discriminant analysis models are implemented by the LinearDiscriminantAnalysis class.

The LinearDiscriminantAnalysis class has four constructors.

The first constructor takes two arguments. The first is a
ICategoricalVector
that represents the dependent variable. The second is a parameter array of
Vector

var dependent = Vector.CreateCategorical(yData); var independent1 = Vector.Create(x1Data); var independent2 = Vector.Create(x2Data); var model1 = new LinearDiscriminantAnalysis(dependent, independent1, independent2);

The second constructor takes 3 arguments. The first argument is a
IDataFrame
(a DataFrame

var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>() { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } }); var model2 = new LinearDiscriminantAnalysis(dataFrame, "y", "x1", "x2");

The third constructor takes 2 arguments. The first argument is once again a
IDataFrame
(a DataFrame

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

The Predictions
property returns a CategoricalVector

The result of a discriminant analysis is a set of discriminant functions. These are linear functions of the variables in the direction that most separates the observations in each group from those in other groups. The LinearDiscriminantAnalysis class' DiscriminantFunctions property returns a collection of LinearDiscriminantFunction object that represent the discriminant functions. The discriminant functions are constructed by the model. You cannot create them directly.

Discriminant functions perform a role similar to factors in Factor Analysis or components in Principal Component Analysis (PCA). Discriminant functions are based on a generalized eigenvalue decomposition. The corresponding eigenvalue and eigenvector can be accessed through the Eigenvalue and Eigenvector properties. The EigenvalueDifference property returns the difference between the eigenvalues of the discriminant function and the next most significant discriminant function. The ProportionOfVariance and CumulativeProportionOfVariance properties give the contribution of the discriminant function to the variation in the data in relative terms.

The CanonicalCorrelation property returns the canonical correlation between the discriminant function and the groups. A higher coefficient indicates a higher relevance. Similarly, WilksLambda returns Wilks' lambda, which is a statistic that is based on the canonical correlations.

The significance of a discriminant function can be quantified further using
a hypothesis test based on Wilks' lambda. This is an F test that is exact
when the number of groups is less than 3 or when the number of included functions
is 1 or 2.
The GetFTest

In the example below, these properties are printed for the discriminant functions from the earlier example:

Console.WriteLine(" # Eigenvalue Difference Contribution Contrib. % Can.Corr F stat. df1 df2"); for (int i = 0; i < model1.DiscriminantFunctions.Count; i++) { var fn = model1.DiscriminantFunctions[i]; var f = fn.GetFTest(); Console.WriteLine("{0,2}{1,12:F4}{1,11:F4}{2,14:F3}%{3,10:F3}%{4,9:F4}{5,9:F3}{6,9:F3}{7,4}{8,4}", i, fn.Eigenvalue, fn.EigenvalueDifference, 100 * fn.ProportionOfVariance, 100 * fn.CumulativeProportionOfVariance, fn.CanonicalCorrelation, fn.WilksLambda, f.Statistic, f.NumeratorDegreesOfFreedom, f.DenominatorDegreesOfFreedom); }

When using a linear discriminant analysis for classification, the probability that an observation belongs to each class is computed. The class with the highest probability is selected.

The Predict method that takes a vector or data frame and produces the model's prediction for the supplied data. When a single observation is supplied (as a vector), the method returns an integer that is the level index of the predicted class. When multiple observations are supplied, the method returns a vector of level indexes.

Similarly, the PredictProbabilities method returns the predicted probabilities for each class. When a single observation is supplied (as a vector), the method returns a vector that contains the probabilities that the observation belongs to each of the classes. When multiple observations are supplied, the method returns a matrix, where each row contains the probabilities for the corresponding observation.

var index = model1.Predict(Vector.Create(1.2, 3.0)); var input = Matrix.CreateRandom(10, 2); var predictions = model1.Predict(input);

When used for dimensionality reduction, the observations are projected onto
the direction that most separate the classes.
The LinearDiscriminantAnalysis
class implements the
ITransformationModel
interface to support this operation. The
Transform(Matrix

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.