Linear Discriminant Analysis (LDA) is a technique for multi-class classification
and dimensionality reduction.
It is based on the assumption that the observations in each class or group
are distributed as a multivariate Gaussian distribution, and that all groups
have the same covariance matrix. (A variation where groups can have different
covariance matrices is called quadratic discriminant analysis (QDA).
When used for dimensionality reduction, the features are projected onto
the directions that most separate the classes.
When used for classification, it is sufficient to consider the distance
to the group centroids in the projected space.
Discriminant analysis models are implemented by the
LinearDiscriminantAnalysis
class.
Constructing Linear Discriminant Analysis Models
The LinearDiscriminantAnalysis
class has four constructors.
The first constructor takes two arguments. The first is a
ICategoricalVector
that represents the dependent variable. The second is a parameter array of
VectorT
objects that represent the independent variables.
var dependent = Vector.CreateCategorical(yData);
var independent1 = Vector.Create(x1Data);
var independent2 = Vector.Create(x2Data);
var model1 = new LinearDiscriminantAnalysis(dependent, independent1, independent2);
Dim dependent = Vector.CreateCategorical(yData)
Dim independent1 = Vector.Create(x1Data)
Dim independent2 = Vector.Create(x2Data)
Dim model1 = New LinearDiscriminantAnalysis(dependent, independent1, independent2)
No code example is currently available or this language may not be supported.
let dependent = Vector.CreateCategorical(yData)
let independent1 = Vector.Create(x1Data)
let independent2 = Vector.Create(x2Data)
let model1 = new LinearDiscriminantAnalysis(dependent, independent1, independent2)
The second constructor takes 3 arguments. The first argument is a
IDataFrame
(a DataFrameR, C or
MatrixT) that
contains the variables to be used in the analysis. The second argument
is a string containing the name of the dependent variable.
The third argument is a parameter array of strings containing the names
of the independent variables. All the names must exist in the column index
of the data frame specified by the first parameter.
var dataFrame = DataFrame.FromColumns(
( "y", dependent ),
( "x1", independent1 ),
( "x2", independent2 ));
var model2 = new LinearDiscriminantAnalysis(dataFrame, "y", "x1", "x2");
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
{{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New LinearDiscriminantAnalysis(frame, "y", "x1", "x2")
No code example is currently available or this language may not be supported.
let columns = Dictionary<string,obj>()
columns.Add("y", dependent)
columns.Add("x1", independent1)
columns.Add("x2", independent2)
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = new LinearDiscriminantAnalysis(dataFrame, "y", "x1", "x2")
The third constructor takes 2 arguments. The first argument is once again a
IDataFrame
(a DataFrameR, C or
MatrixT) that
contains the variables to be used in the analysis. The second argument
is a string containing a formula that specifies the dependent and independent
variables. See the section on Defining models using formulas
for details of formula syntax.
var model3 = new LinearDiscriminantAnalysis(dataFrame, "y ~ x1 + x2");
Dim model3 = New LinearDiscriminantAnalysis(frame, "y ~ x1 + x2")
No code example is currently available or this language may not be supported.
let model3 = new LinearDiscriminantAnalysis(dataFrame, "y ~ x1 + x2")
The Compute method performs the actual analysis.
Most properties and methods throw an exception when they are accessed before the Compute method is
called. You can verify that the model has been calculated by inspecting the Computed property.
No code example is currently available or this language may not be supported.
The Predictions
property returns a CategoricalVectorT
that contains the values of the dependent variable as predicted by the model.
The PredictedProbabilities
property returns a MatrixT
that gives the probability of each outcome for each observation.
A related property,
PredictedLogProbabilities
property returns the natural logarithm of the predicted probabilities.
The ProbabilityResiduals
property returns a matrix containing the difference between the actual (0 or 1) and
the predicted probabilities.
The result of a discriminant analysis is a set of discriminant functions.
These are linear functions of the variables in the direction that most separates
the observations in each group from those in other groups.
The LinearDiscriminantAnalysis
class' DiscriminantFunctions
property returns a collection of
LinearDiscriminantFunction
object that represent the discriminant functions.
The discriminant functions are constructed by the model. You cannot create them directly.
Discriminant functions perform a role similar to factors in
Factor Analysis or components
in Principal Component Analysis (PCA).
Discriminant functions are based on a generalized eigenvalue decomposition.
The corresponding eigenvalue and eigenvector can be accessed through the
Eigenvalue
and Eigenvector
properties.
The EigenvalueDifference
property returns the difference between the eigenvalues of the discriminant function
and the next most significant discriminant function.
The ProportionOfVariance and
CumulativeProportionOfVariance properties
give the contribution of the discriminant function to the variation in the data in relative terms.
The CanonicalCorrelation
property returns the canonical correlation between the discriminant function and the groups.
A higher coefficient indicates a higher relevance.
Similarly, WilksLambda
returns Wilks' lambda, which is a statistic that is based on the
canonical correlations.
The significance of a discriminant function can be quantified further using
a hypothesis test based on Wilks' lambda. This is an F test that is exact
when the number of groups is less than 3 or when the number of included functions
is 1 or 2.
The GetFTest
method returns this test.
In the example below, these properties are printed for the discriminant functions
from the earlier example:
Console.WriteLine(" # Eigenvalue Difference Contribution Contrib. % Can.Corr F stat. df1 df2");
for (int i = 0; i < model1.DiscriminantFunctions.Count; i++)
{
var fn = model1.DiscriminantFunctions[i];
var f = fn.GetFTest();
Console.WriteLine("{0,2}{1,12:F4}{1,11:F4}{2,14:F3}%{3,10:F3}%{4,9:F4}{5,9:F3}{6,9:F3}{7,4}{8,4}",
i, fn.Eigenvalue, fn.EigenvalueDifference,
100 * fn.ProportionOfVariance,
100 * fn.CumulativeProportionOfVariance,
fn.CanonicalCorrelation,
fn.WilksLambda,
f.Statistic,
f.NumeratorDegreesOfFreedom,
f.DenominatorDegreesOfFreedom);
}
Console.WriteLine()
For i As Integer = 0 To model1.DiscriminantFunctions.Count - 1
Dim fn = model1.DiscriminantFunctions(i)
Dim f = fn.GetFTest()
Console.WriteLine("{0,2}{1,12:F4}{1,11:F4}{2,14:F3}%{3,10:F3}%{4,9:F4}{5,9:F3}{6,9:F3}{7,4}{8,4}",
i, fn.Eigenvalue, fn.EigenvalueDifference,
100 * fn.ProportionOfVariance,
100 * fn.CumulativeProportionOfVariance,
fn.CanonicalCorrelation,
fn.WilksLambda,
f.Statistic,
f.NumeratorDegreesOfFreedom,
f.DenominatorDegreesOfFreedom)
Next
No code example is currently available or this language may not be supported.
Console.WriteLine(" // # Eigenvalue Difference Contribution Contrib. % Can.Corr F stat. df1 df2")
for i in 0..model1.DiscriminantFunctions.Count-1 do
let fn = model1.DiscriminantFunctions.[i]
let f = fn.GetFTest()
printfn "%2d%12.4f%11.4f%14.3f%10.3f%9.4f%9.3f%9.3f%4.0f%4.0f"
i fn.Eigenvalue fn.EigenvalueDifference
(100.0 * fn.ProportionOfVariance)
(100.0 * fn.CumulativeProportionOfVariance)
fn.CanonicalCorrelation
fn.WilksLambda
f.Statistic
f.NumeratorDegreesOfFreedom
f.DenominatorDegreesOfFreedom
When using a linear discriminant analysis for classification,
the probability that an observation belongs to each class is computed.
The class with the highest probability is selected.
The
Predict
method that takes a vector or data frame and produces the model's
prediction for the supplied data. When a single observation is supplied (as a vector),
the method returns an integer that is the level index of the predicted class.
When multiple observations are supplied, the method returns a vector of level indexes.
Similarly, the
PredictProbabilities
method returns the predicted probabilities for each class.
When a single observation is supplied (as a vector),
the method returns a vector that contains the probabilities that the observation
belongs to each of the classes.
When multiple observations are supplied, the method returns a matrix, where each row
contains the probabilities for the corresponding observation.
var index = model1.Predict(Vector.Create(1.2, 3.0));
var input = Matrix.CreateRandom(10, 2);
var predictions = model1.Predict(input);
Dim Index = model1.Predict(Vector.Create(1.2, 3.0))
Dim inputs = Matrix.CreateRandom(10, 2)
Dim predictions = model1.Predict(inputs)
No code example is currently available or this language may not be supported.
let index = model1.Predict(Vector.Create(1.2, 3.0))
let input = Matrix.CreateRandom(10, 2)
let predictions = model1.Predict(input)
When used for dimensionality reduction, the observations are projected onto
the direction that most separate the classes.
The LinearDiscriminantAnalysis
class implements the
ITransformationModel
interface to support this operation. The
Transform(MatrixDouble)
method performs this operation.
It takes one argument: a matrix whose rows contain the observations.
It returns a matrix whose columns are the projections of the original features
on the discriminant directions.
var transformed = model1.Transform(dataFrame.ToMatrix<double>());
Dim transformed = model1.Transform(frame.ToMatrix(Of Double))
No code example is currently available or this language may not be supported.
let transformed = model1.Transform(dataFrame.ToMatrix<double>())