Partial Least Squares | Extreme Optimization Numerical Libraries for .NET Professional |

Partial least squares is a technique that fits combinations of independent variables called factors to one or more dependent variables. The factors are chosen to maximize the covariance between the factors and the dependent variables.

Partial least squares is useful when the number of independent variables is large compared to the number of observations, or when variables are highly correlated.

The PartialLeastSquaresModel
class has four constructors. The first constructor takes three arguments.
The first is a Vector

In the example below, we create two Partial Least Squares models using random data. The first has one dependent variable, 10 independent variables and 20 observations. The second has 3 dependent variables. In both cases, we're asking for 5 factors:

var dependent = Vector.CreateRandom(20); var independents = Matrix.CreateRandom(20, 10); var model1 = new PartialLeastSquaresModel(dependent, independents, 5); var dependents = Matrix.CreateRandom(20, 3); var model2 = new PartialLeastSquaresModel(dependents, independents, 5);

The third constructor takes 4 arguments. The first argument is a
IDataFrame (a
DataFrame

In the code that follows, we give the two matrices of dependent and independent variables a column index. We join these matrices to get a matrix that can act as a data frame. We then use this matrix, along with the arrays of column names, to construct the same PLS model:

var xNames = new string[] { "x1", "x2", "x3", "x4", "x5", "x6", "x7","x8", "x9", "x10" }; independents.ColumnIndex = Index.Create(xNames); var yNames = new string[] { "y1", "y2", "y3" }; dependents.ColumnIndex = Index.Create(yNames); // A matrix can act as a data frame: var all = Matrix.JoinHorizontal(independents, dependents); var model3 = new PartialLeastSquaresModel(all, yNames, xNames, 5);

The fourth constructor takes three arguments. The first argument once again contains the data. The second is a string that contains a formula that describes the model. See the section on formulas for details. The same model as above can be defined using a formula as:

We used the special . term in the right-hand side to capture all remaining columns as independent variables.

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

Fitting the model is done with one of two standard algorithms: NIPALS (Nonlinear Iterative PArtial Least Squares) or SIMPLS (Statistically Inspired Modification of Partial Least Squares). The two algorithms give identical results when there is only one dependent variable.

By default, the NIPALS algorithm is used. You can change this by setting the Method property. This property is of type PartialLeastSquaresMethod and can take on the following values:

Method | Description |
---|---|

Nipals | Use the original Nonlinear Iterative PArtial Least Squares method (NIPALS). |

Simpls | Use the Statistically Inspired Modification of Partial Least Squares method (SIMPLS) of de Jong. |

The number of components to compute can be changed by setting the NumberOfComponents property. In the next example, we compute the first model we created earlier using default settings. For the second model, we change the number of requested components to 7 and compute the model using the SIMPLS algorithm:

```
model1.Compute();
model2.NumberOfComponents = 7;
model2.Method = PartialLeastSquaresMethod.Simpls;
model2.Compute();
```

The PredictedValues
property returns a Matrix

The Coefficients property returns the matrix of regression coefficients of the model. The Intercepts returns the vector of corresponding intercepts. The StandardizedCoefficients property returns a matrix of the standardized coefficients, based on centered and normalized variables.

Several properties give information about the factors and how they relate to the dependent and independent variables. In PLS, both the matrix of independent and dependent variables are decomposed into components. Similar terminology is used.

The XLoadings
property returns a matrix that contains the loadings
and XScores
returns a matrix that contains the scores of the independent variables.
These are the factors T and P in the decomposition
of X into TP^{T}.
The YLoadings
property returns a matrix that contains the loadings
and YScores
returns a matrix that contains the scores of the dependent variables.
These are the factors U and Q in the decomposition
of Y into UQ^{T}.
In addition, the WeightMatrix
property returns a matrix containing the projection weights for the independent
variables.

The Predict method can be used to predict the values of the dependent variables for new data. The method has three overloads, which all take two arguments. The first overload takes a vector as its first argument. The vector contains the values of the independent variables for which a prediction should be made. The second argument, which is always optional, specifies how the values in the vector relate to the variables in the model. This overload returns a vector that contains the predictions for each of the dependent variables.

The second and third overloads take a matrix and a data frame, respectively, as their first argument. Each row in the matrix or data frame corresponds to an observation. The methods return a matrix whose rows contain the corresponding predictions for the dependent variables.

One of the objectives of Partial Least Squares is to capture as much as possible of the variance in both the dependent and the independent variables. The XVarianceExplained and YVarianceExplained properties return vectors that contain the proportion of variance explained by each factor. Corresponding XCumulativeVarianceExplainedYCumulativeVarianceExplained return the cumulative proportions.

The quality of a PLS model is often assessed using a validation test set.
The Press(Matrix

These methods can be used to determine the ideal number of components using cross validation. In the example below, we split the input into a training and a test dataset. We print out the PRESS value for the test set for a model based on a varying number of components, from 0 to 10:

// Create subsets (sets of indices) for train and test data: var trainingSet = new Subset(all.RowCount, 0, 9); var testSet = new Subset(all.RowCount, 10, 20); // Generate the train and test data sets: var XTrain = independents.GetRows(trainingSet); var YTrain = dependents.GetRows(trainingSet); // Set up the model: var model = new PartialLeastSquaresModel(YTrain, XTrain, 0); for (int k = 0; k <= 10; k++) { model.NumberOfComponents = k; model.Compute(); var XTest = independents.GetRows(testSet); var YTest = dependents.GetRows(testSet); double rmPress = model.RootMeanPress(YTest, XTest); Console.WriteLine("{0}: {1:F6}", k, rmPress); }

Copyright Â© 2004-20116,
Extreme Optimization. All rights reserved.

*Extreme Optimization,* *Complexity made simple*, *M#*, and *M
Sharp* are trademarks of ExoAnalytics Inc.

*Microsoft*, *Visual C#, Visual Basic, Visual Studio*, *Visual
Studio.NET*, and the *Optimized for Visual Studio* logo

are
registered trademarks of Microsoft Corporation.