Factor analysis is a method of grouping a set of variables into
related subsets. Different methods exist for extracting
the factors. After extraction, the factors can be rotated
in order to further bring out the relationship between variables.
Factor analysis is implemented by the
FactorAnalysis
class and related types in the
Extreme.Statistics.Multivariate
namespace.
A factor analysis can operate on either the correlation matrix or the covariance matrix
of a set of variables. The first step in a factor analysis is the computation of this matrix.
The options are enumerated in the
FactorMethod
enumeration type. It can have the following values:
Value | Description |
---|
Correlation | Use the correlation matrix of the variables to perform the calculations. This is the default. |
Covariance | Use the covariance matrix of the variables to perform the calculations. |
The next step in the factor analysis is to extract the factors from
the correlation or covariance matrix. There are several ways to perform
this step. A major difference between Principal Component Analysis and
Factor Analysis is that Factor Analysis tries to analyze only the variance
that is shared between variables, and tries to exclude variance that
is unique to each variable. This is reflected in the fact that,
whereas the correlation matrix used in PCA has all 1's on the diagonal,
the matrix used in Factor Analysis typically does not. The
elements on the diagonal are called the communalities.
Much of the difference in factor extraction methods consists in the
way the communalities are estimated. The complement of the communality
of a variable is its uniqueness
The objective is always to obtain factors that reproduce the
observed correlations or covariances between the variables as closely
as possible. The matrix of these correlations is the
reconstructed correlation matrix.
The extraction options are enumerated by the
FactorExtractionMethod
enumeration type:
Value | Description |
---|
PrincipalComponents |
Compute the principal components. This is equivalent to
Principal Component Analysis. This method can be used with a covariance matrix.
|
IterativePrincipalAxis |
An iterative process that estimates the communalities
using the Squared Multiple Correlation (SMC) as the starting point.
This method can be used with a covariance matrix.
|
UnweightedLeastSquares |
A method that tries to minimize the sum of the squared differences between
the correlation matrix and the reconstructed correlation matrix.
|
GeneralizedLeastSquares |
A method that tries to minimize the sum of the squared differences between
the correlation matrix and the reconstructed correlation matrix weighted
by the inverse of the variable's uniqueness.
|
MaximumLikelihood | Maximum likelihood estimation. |
AlphaFactoring | A method that maximizes the alpha-reliability of the factors. |
ImageFactoring |
A method where the common part of a variable is defined as its linear regression with respect
to the other variables. This method can be used with a covariance matrix.
|
After extraction, rotation is commonly used to maximize high correlations
and minimize low correlations. This can be done in two ways. An orthogonal rotation
maintains the property that factors are uncorrelated. An oblique rotation
does not have this limitation, and can achieve more extreme correlations, but at a cost of increased
complexity.
There are many rotation methods, and only the most common ones have been implemented.
The available rotation methods are enumerated by the
FactorRotationMethod enumeration
type. Its members are listed below:
Value | Description |
---|
None | Don't rotate the factors. This is considered orthogonal. |
Varimax |
The most common rotation orthogonal method. It maximizes the variance
of factor loadings by increasing high factor loadings and lowering low ones.
|
Equamax |
An orthogonal rotation method that maximizes the variance of the loadings
of each variable.
|
Quartimax |
An orthogonal rotation method that maximizes the variance of the loadings
of both factors and variables.
|
Promax |
A popular oblique rotation method that first performs an orthogonal
rotation and then uses powers of loadings to emphasize high and low values.
|
The Promax rotation method requires one argument, the power that the loadings are to be raised to
in the oblique phase of the method. This value is set through the
PromaxPower
property.
Running a factor analysis
Factor analysis is implemented by the
FactorAnalysis class.
This class has four constructors. The first constructor takes one argument: a
MatrixT whose columns
contain the data to be analyzed.
The second constructor also takes one argument: an array of
VectorT objects.
var matrix = Matrix.CreateRandom(100, 10);
var fa1 = new PrincipalComponentAnalysis(matrix);
var vectors = matrix.Columns.ToArray();
var fa2 = new PrincipalComponentAnalysis(vectors);
Dim mat = Matrix.CreateRandom(100, 10)
Dim fa1 = New PrincipalComponentAnalysis(mat)
Dim vectors = mat.Columns.ToArray()
Dim fa2 = New PrincipalComponentAnalysis(vectors)
No code example is currently available or this language may not be supported.
let matrix = Matrix.CreateRandom(100, 10)
let fa1 = new PrincipalComponentAnalysis(matrix)
let vectors = matrix.Columns.ToArray()
let fa2 = new PrincipalComponentAnalysis(vectors)
The third constructor takes two arguments. The first is a
IDataFrame (a
DataFrameR, C or
MatrixT)
that contains the variables that may be used in the analysis.
The second argument is an array of strings
that contains the names of the variables from the collection that should be included
in the analysis.
The following example loads some data from a Stata
file, filters out missing values, and creates a factor analysis object:
var dataFrame = StataFile.ReadDataFrame(@"..\..\..\..\Data\m255.dta");
string[] names = { "item13", "item14", "item15", "item16",
"item17", "item18", "item19", "item20", "item21",
"item22", "item23", "item24" };
dataFrame = dataFrame.RemoveRowsWithMissingValues(names);
FactorAnalysis fa = new FactorAnalysis(dataFrame, names);
Dim frame = StataFile.ReadDataFrame("..\..\..\..\Data\m255.dta")
Dim names = {"item13", "item14", "item15", "item16",
"item17", "item18", "item19", "item20", "item21",
"item22", "item23", "item24"}
frame = frame.RemoveRowsWithMissingValues(names)
Dim fa = New FactorAnalysis(frame, names)
No code example is currently available or this language may not be supported.
let dataFrame = StataFile.ReadDataFrame(@"..\..\..\..\Data\m255.dta")
let names = [| "item13"; "item14"; "item15"; "item16";
"item17"; "item18"; "item19"; "item20"; "item21";
"item22"; "item23"; "item24" |]
let dataFrame = dataFrame.RemoveRowsWithMissingValues(names)
let fa = FactorAnalysis(dataFrame, names)
The fourth constructor is used to perform the factor analysis
directly on a correlation or covariance matrix. It takes two arguments.
The first is a
SymmetricMatrixT.
The second argument is a FactorMethod
value that specifies whether the first argument is a correlation matrix
or a covariance matrix.
Once the factor analysis object is created, the factor extraction
and rotation methods should be specified. This is done through two properties.
The
ExtractionMethod
property is a
FactorExtractionMethod.
The default is to use iterated principal axis extraction.
The
RotationMethod
property is a
FactorRotationMethod.
The default is to use Varimax rotation.
The number of factors can be specified in a number of ways. The
NumberOfFactors
property can be set to the desired number. Alternatively, the
FactorCountMethod
property can be set, usually in combination with the
FactorThreshold
property. This value is of type
FactorCountMethod
and can have the following values:
Value | Description |
---|
Fixed |
The number of factors is determined by the value of the
NumberOfFactors
property.
|
Automatic |
The number of factors equals the number of eigenvalues greater than
a factor equal to the value of the
FactorThreshold
property.
|
AutomaticRelativeToMean |
he number of factors equals the number of eigenvalues greater than
a factor equal to the value of the
FactorThreshold
property times the mean of the eigenvalues..
|
All |
An orthogonal rotation method that maximizes the variance of the loadings
of both factors and variables.
|
The Compute
method performs the actual calculations.
The example below starts from the factor analysis object created earlier,
selects the iterated principal axis method to extract 3 factors
and Varimax rotation, and runs the analysis:
fa.NumberOfFactors = 3;
fa.ExtractionMethod = FactorExtractionMethod.IterativePrincipalAxis;
fa.RotationMethod = FactorRotationMethod.Varimax;
fa.Fit();
fa.NumberOfFactors = 3
fa.ExtractionMethod = FactorExtractionMethod.IterativePrincipalAxis
fa.RotationMethod = FactorRotationMethod.Varimax
fa.Fit()
No code example is currently available or this language may not be supported.
fa.NumberOfFactors <- 3
fa.ExtractionMethod <- FactorExtractionMethod.IterativePrincipalAxis
fa.RotationMethod <- FactorRotationMethod.Varimax
fa.Fit()
Once the computations are complete,
a number of properties and methods give access to the results in detail.
Which properties are available depends to some degree on whether the
factor rotation is orthogonal or oblique.
The
GetUnrotatedFactors
and
GetRotatedFactors
return read-only collections of
Factor
objects that represent the factors before and after rotation, respectively.
The properties of these factors can be inspected to get the details of each factor.
In addition, the
FactorAnalysis
object itself has a large number of vector and matrix properties. They are listed
in the table below.
Property | Description |
---|
InitialCommunalities |
A vector containing the initial values of the communalities
of the variables.
|
Communalities |
A vector containing the communalities of the extracted
factors.
|
LoadingsMatrix | A matrix whose columns contain the factor loadings before rotation. |
RotatedLoadingsMatrix |
A matrix whose columns contain the factor loadings after rotation. For
oblique rotations, this is the same as the
|
PatternMatrix |
For oblique rotations, a matrix whose columns contain the factor pattern (the
contribution of each factor to the variance of each variable). For orthogonal rotations this
is the same as the
RotatedLoadingsMatrix.
|
StructureMatrix |
For oblique rotations, a matrix whose columns contain the factor structure (the
correlation between each factor and each variable). For orthogonal rotations this
is the same as the
RotatedLoadingsMatrix.
|
FactorTransformationMatrix | The matrix that transforms the initial factors to the rotated factors. |
FactorCorrelationMatrix |
A symmetric matrix of correlations between the factors.
This is only meaningful for oblique rotations. For orthogonal rotations,
this equals the identity matrix.
|
FactorScoreCoefficientMatrix | A matrix containing the coefficients of the factor scores. |