Performs orthogonal partial least squares regression (V. M. Cave).

### Options

`PRINT` = string tokens |
Printed output required (`data` , `xloadings` , `yloadings` , `ploadings` , `scores` , `leverages` , `xerrors` , `yerrors` , `scree` , `xpercent` , `ypercent` , `predictions` , `groups` , `estimates` , `fittedvalues` , `summary` ); default `esti` , `xper` , `yper` , `scor` , `xloa` , `yloa` , `ploa` , `summ` |
---|---|

`PCPRINT` = string tokens |
Controls printed output from principal components analysis of orthogonal `X` matrix (`loadings` , `roots` , `scores` , `tests` ); default `root` |

`PLOT` = string token |
What graphs to plot (`pcplot` ); default `*` (i.e. none) |

`NORTHOGONALROOTS` = scalar |
Number of orthogonal components to extract; default 1 |

`NROOTS` = scalar |
Number of predictive (i.e. PLS) components to extract; default 1 |

`STANDARDIZE` = string tokens |
Whether to standardize the `Y` , `X` and filtered `X` variables to unit variance and zero mean (`Y` , `X` , `filteredX` ); default `*` (i.e. no standardizing) |

`NGROUPS` = scalar |
Number of cross-validation groups used by PLS; default 1 (i.e. no cross-validation performed) |

`SEED` = scalar or factor |
A scalar indicating the seed value used for dividing the data randomly into `NGROUPS` groups for cross-validation by PLS, or a factor indicating a specific set of groupings to use for cross-validation by PLS; default 0 |

`LABELS` = text |
Sample labels for `X` and `Y` to use in output; default uses the integers 1…n where n is the length of the variates in `X` and `Y` |

`PLABELS` = text |
Labels for `XPREDICTIONS` ; default uses `P1` , `P2` etc. |

`PCMETHOD` = string tokens |
Method used by `PCP` to perform principal components analysis on the orthogonal `X` matrix (`ssp` , `correlation` , `vcovariance` , `variancecovariance` ); default `*` (i.e. principal components analysis not performed) |

`WINDOW` = scalar |
Window to use for graph (available only when `NORTHOGONALROOTS` = 1); default 3 |

### Parameters

`Y` = pointers |
Pointer to variates containing the dependent variable(s) for each analysis |
---|---|

`X` = pointers |
Pointer to variates containing the independent variables for each analysis |

`YLOADINGS` = pointers |
Pointer to variates containing the `Y` component loadings, for the predictive (i.e. PLS) dimensions, extracted from the filtered `X` matrix |

`XLOADINGS` = pointers |
Pointer to variates containing the component loading weights for the predictive dimensions, extracted from the filtered `X` matrix |

`PLOADINGS` = pointers |
Pointer to variates containing the bilinear model loadings for the predictive dimensions, extracted from the filtered `X` matrix |

`YSCORES` = pointers |
Pointer to variates containing the `Y` component scores, for each predictive dimension extracted from the filtered `X` matrix |

`XSCORES` =pointers |
Pointer to variates containing the component scores for each predictive dimension, extracted from the filtered `X` matrix |

`B` = diagonal matrices |
Saves the regression coefficients of `YSCORES` on `XSCORES` , for the predictive dimensions, extracted from the filtered `X` matrix |

`YPREDICTIONS` = pointers |
Pointer to variates used to store predicted y-values for samples in the prediction set |

`XPREDICTIONS` = pointers |
Pointer to variates containing data for the independent variables in the prediction set |

`ESTIMATES` = matrices |
An n+1 by _{X}n matrix (where _{Y}n and _{X}n are the number of variates contained in _{Y}`X` and `Y` , respectively) to store the PLS regression coefficients |

`FITTEDVALUES` = pointers |
Pointer to variates used to store the fitted values for the `Y` variates |

`LEVERAGES` = variates |
Variate to store the leverage that each sample has on the PLS model |

`PRESS` = variates |
Variate used to store the Predictive Residual Error Sum of Squares for each dimension in the PLS model, available only if cross-validation has been selected |

`RSS` = variates |
Variate to save residual sums of squares |

`YRESIDUALS` = pointers |
Pointer to variates containing the residuals from the `Y` block after `NROOTS` predictive dimensions have been extracted, uncorrected for any scaling applied using `STANDARDIZE` |

`XRESIDUALS` = pointers |
Pointer to variates containing the residuals from the `X` block after `NROOTS` predictive dimensions have been extracted, uncorrected for any scaling applied using `STANDARDIZE` |

`PCSCORES` = matrices |
Matrix to save principal component scores |

`PCSAVE` = pointers |
Pointer to save structures from the principal component analysis (by `PCP` ) of the orthogonal `X` matrix |

`SAVE` = pointers |
Pointer to save structures from the orthogonal projection |

### Description

`OPLS`

performs orthogonal partial least squares (O-PLS) regression.

Variation in *X* that is orthogonal (i.e. uncorrelated) to *Y* may disturb PLS modelling, complicating the model interpretation. O-PLS combines PLS with a pre-processing step that filters out systematic variation in *X*, orthogonal to *Y*, that disturbs the PLS model. To improve model interpretation, the variation explained by each regular PLS component is partitioned into two parts:

1) variation linearly related to *Y* (i.e. predictive) and

2) variation orthogonal to *Y*.

The resulting O-PLS model takes the form:

*X* = *TP*^{T} + *T*_{ortho}*P*_{ortho}^{T} + *E*

*Y* = *TC*^{T} + *F*

where *T* = *XW* and *T*_{ortho} = *XW*_{ortho}. The predictive variation in *X* is modelled by the matrices *T*, *W* and *P*, whose columns contain the predictive component scores, loading weights and loadings, respectively. The orthogonal variation is modelled by analogous matrices *T*_{ortho}, *W*_{ortho} and *P*_{ortho}, whose columns contain the orthogonal component scores, loading weights and loadings, respectively. The columns of matrix *C* contain *Y*-loadings, and *E* and *F* are the residual matrices.

The number of predictive components used to model the predictive variation is specified by the `NROOTS`

option; default 1. The number of orthogonal components used to model the orthogonal variation is specified by the `NORTHOGONALROOTS`

option; default 1. The `OPLS`

procedure also enables the orthogonal variation to be further explored, through principal components analysis.

In practice, the `OPLS`

procedure removes *Y*-orthogonal variation from *X* to form a filtered *X* matrix (*Xfiltered*). A PLS model is then fitted to *Xfiltered*, using the `PLS`

procedure.

The dependent and independent variates are supplied using the `Y`

and `X`

parameters, respectively, as pointers containing a variate for each dimension. The `Y`

and `X`

variates must not contain missing values. A pointer of variates containing new *X* data, for which predictions are desired, can be specified by the `XPREDICTIONS`

parameter. Sample labels for `X`

and `XPREDICTIONS`

can be provided by using the `LABELS`

and `PLABELS`

options, respectively.

The `STANDARDIZE`

option controls whether the *Y*, *X* and the filtered *X* variables are standardized to mean zero and unit variance prior to analysis. The *Y* variables are standardized prior to orthogonal projection and PLS analysis, the *X* variables are standardized prior to orthogonal projection, and the filtered *X* variables are standardized prior to modelling by PLS. By default, none of these are standardized. Note, however, that all variables are automatically centred prior to the PLS analysis, even if no standardization is requested.

The `SAVE`

parameter can supply a pointer to store structures from orthogonal projection. The labels of the pointer, and their corresponding information, are as follows:

`w_ortho` |
orthogonal component loading weights, |
---|---|

`t_ortho` |
orthogonal component scores |

`p_ortho` |
orthogonal loadings, |

`X_filtered ` |
filtered X matrix, with the orthogonal variation removed, |

`X_ortho` |
matrix containing the orthogonal variation, |

`Xpred_filtered` |
filtered prediction X matrix, with the orthogonal variation removed, |

`Xpred_ortho` |
matrix containing the orthogonal variation of the prediction X matrix. |

The `NGROUPS`

and `SEED`

options control cross-validation by the `PLS`

procedure. The parameters `YLOADINGS`

, `XLOADINGS`

, `PLOADINGS`

, `YSCORES`

, `XSCORES`

, `B`

, `YPREDICTIONS`

, `ESTIMATES`

, `FITTEDVALUES`

, `LEVERAGES`

, `PRESS`

, `RSS`

, `YRESIDUALS`

and `XRESIDUALS`

allow output from the `PLS`

procedure to be saved (i.e. from modelling the predictive variation).

Printed output is controlled by the `PRINT`

option. Almost all of the settings are the same as those of the `PLS`

procedure, and are used in exactly the same way. However, there is an additional setting, `summary`

, which summarizes the percentage of variation in `X`

explained by each orthogonal and predictive (i.e. PLS) component.

You can set the `PCMETHOD`

option to request a principal component analysis to decompose the matrix of orthogonal variation (see `X_ortho`

above), and to specify the method to use. Its settings are the same as those of the `METHOD`

option of the `PCP`

directive. Printed output is controlled by the `PCPRINT`

option, which operates exactly as the `PRINT`

option of the `PCP`

directive. The `PCSAVE`

parameter can supply a pointer to store details from the analysis. You can set option `PLOT`

= `pcplot`

to produce a score plot; by default, no plot is produced. When `NORTHOGONALROOTS`

= 1, the `WINDOW`

option can be used to control the window to used for the plot; default 3.

Options: `PRINT`

, `PCPRINT`

, `PLOT`

, `NORTHOGONALROOTS`

, `NROOTS`

, `STANDARDIZE`

, `NGROUPS`

, `SEED`

,` LABELS`

, `PLABELS`

, `PCMETHOD`

, `WINDOW`

.

Parameters: `Y`

, `X`

, `YLOADINGS`

, `XLOADINGS`

, `PLOADINGS`

, `YSCORES`

, `XSCORES`

, `B`

, `YPREDICTIONS`

, `XPREDICTIONS`

, `ESTIMATES`

, `FITTEDVALUES`

, `LEVERAGES`

, `PRESS`

, `RSS`

, `YRESIDUALS`

, `XRESIDUALS`

, `PCSCORES`

, `PCSAVE`

, `SAVE`

.

### Method

`OPLS`

uses the methodology of Trygg & Wold (2002), applying the algorithm described in Biagioni *et al.* (2011), to remove variation from *X* that is not correlated to *Y*. `OPLS`

then calls the `PLS`

procedure to fit a PLS model to the filtered (i.e. pre-treated) matrix with the orthogonal variation removed.

To perform the principal components analysis on the matrix of orthogonal variation, `OPLS`

uses the `PCP`

directive, taking the setting for its `METHOD`

option from the `PCMETHOD`

option, and the setting for its `NROOTS`

option from the `NORTHOGONALROOTS`

option. When there is only one root, the score plot, which can be requested by setting option `PLOT`

= `pcplot`

, is produced by the `DOTHISTOGRAM`

procedure. When there are several roots, it is produced by the `DMSCATTER`

procedure. If the `XPREDICTIONS`

parameter is set, principal component scores for the samples in the prediction set are estimated as described by Trygg & Wold (2002), and plotted in red.

### Action with `RESTRICT`

`OPLS`

will work with restricted variates, fitting an O-PLS model to the subset of objects formed by the restriction. The subset can be defined by restricting any of the `X`

or `Y`

variates. However, if more than one variate is restricted, they must be be restricted in the same way. Note that the unrestricted length of all of the data variates must be the same, and the number of samples in the restricted subset must be at least three. Any restrictions on a text supplied for the `LABELS`

option, or on a factor for the `SEED`

option, are ignored.

When restricted data are supplied, and `LABELS`

are also given, the appropriate subset of labels will appear in the output; if `LABELS `

are not defined, then default labels reflecting the position in the restricted data are used.

No restrictions are allowed on the variates supplied by the `XPREDICTIONS`

parameter, or on the text supplied by the `PLABELS`

option.

### References

Biagioni, D.J., Astling, D.P., Graf, P. & Davis, M.F. (2011). Orthogonal projects to latent structures solutions properties for chemometrics and systems biology. *Journal of Chemometrics*, 25, 514-525.

Trygg, J. & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). *Journal of Chemometrics*, 16, 119-128.

### See also

Procedure: `PLS`

.

Commands for: Multivariate and cluster analysis, Regression analysis.

### Example

CAPTION 'OPLS example',!t('The data are 24 calibration samples used to',\ 'determine the protein content of wheat from spectroscopic readings',\ 'at six different wavelengths.'),\ !t('Fearn, T. (1983), Applied Statistics, 32, 73-79.');\ STYLE=meta,plain,plain VARIATE [NVALUES=24] L[1...6],%Protein READ L[1...6],%Protein 468 123 246 374 386 -11 9.23 458 112 236 368 383 -15 8.01 457 118 240 359 353 -16 10.95 450 115 236 352 340 -15 11.67 464 119 243 366 371 -16 10.41 499 147 273 404 433 5 9.51 463 119 242 370 377 -12 8.67 462 115 238 370 353 -13 7.75 488 134 258 393 377 -5 8.05 483 141 264 384 398 -2 11.39 463 120 243 367 378 -13 9.95 456 111 233 365 365 -15 8.25 512 161 288 415 443 12 10.57 518 167 293 421 450 19 10.23 552 197 324 448 467 32 11.87 497 146 271 407 451 11 8.09 592 229 360 484 524 51 12.55 501 150 274 406 407 11 8.38 483 137 260 385 374 -3 9.64 491 147 269 389 391 1 11.35 463 121 242 366 353 -13 9.70 507 159 285 410 445 13 10.75 474 132 255 376 383 -7 10.75 496 152 276 396 404 6 11.47 : " Extract two orthogonal components before fitting a one dimensional PLS model to the standardized data with leave-one-out cross-validation. Principal components analysis is performed on the orthogonal variation." OPLS [PRINT=summary,estimate,xpercent,ypercent,xloadings,yloadings,ploadings;\ PCPRINT=loadings,roots,scores,tests; PLOT=pcplot; NORTHOGONALROOTS=2;\ NROOTS=1; STANDARDIZE=X,Y; NGROUPS=24; SEED=38639; PCMETHOD=correlation]\ Y=!p(%Protein); X=L