1. Home
  2. OPLS procedure

OPLS procedure

Performs orthogonal partial least squares regression (V. M. Cave).

Options

PRINT = string tokens Printed output required (data, xloadings, yloadings, ploadings, scores, leverages, xerrors, yerrors, scree, xpercent, ypercent, predictions, groups, estimates, fittedvalues, summary); default esti, xper, yper, scor, xloa, yloa, ploa, summ
PCPRINT = string tokens Controls printed output from principal components analysis of orthogonal X matrix (loadings, roots, scores, tests); default root
PLOT = string token What graphs to plot (pcplot); default * (i.e. none)
NORTHOGONALROOTS = scalar Number of orthogonal components to extract; default 1
NROOTS = scalar Number of predictive (i.e. PLS) components to extract; default 1
STANDARDIZE = string tokens Whether to standardize the Y, X and filtered X variables to unit variance and zero mean (Y, X, filteredX); default * (i.e. no standardizing)
NGROUPS = scalar Number of cross-validation groups used by PLS; default 1 (i.e. no cross-validation performed)
SEED = scalar or factor A scalar indicating the seed value used for dividing the data randomly into NGROUPS groups for cross-validation by PLS, or a factor indicating a specific set of groupings to use for cross-validation by PLS; default 0
LABELS = text Sample labels for X and Y to use in output; default uses the integers 1…n where n is the length of the variates in X and Y
PLABELS = text Labels for XPREDICTIONS; default uses P1, P2 etc.
PCMETHOD = string tokens Method used by PCP to perform principal components analysis on the orthogonal X matrix (ssp, correlation, vcovariance, variancecovariance); default * (i.e. principal components analysis not performed)
WINDOW = scalar Window to use for graph (available only when NORTHOGONALROOTS = 1); default 3

Parameters

Y = pointers Pointer to variates containing the dependent variable(s) for each analysis
X = pointers Pointer to variates containing the independent variables for each analysis
YLOADINGS = pointers Pointer to variates containing the Y component loadings, for the predictive (i.e. PLS) dimensions, extracted from the filtered X matrix
XLOADINGS = pointers Pointer to variates containing the component loading weights for the predictive dimensions, extracted from the filtered X matrix
PLOADINGS = pointers Pointer to variates containing the bilinear model loadings for the predictive dimensions, extracted from the filtered X matrix
YSCORES = pointers Pointer to variates containing the Y component scores, for each predictive dimension extracted from the filtered X matrix
XSCORES =pointers Pointer to variates containing the component scores for each predictive dimension, extracted from the filtered X matrix
B = diagonal matrices Saves the regression coefficients of YSCORES on XSCORES, for the predictive dimensions, extracted from the filtered X matrix
YPREDICTIONS = pointers Pointer to variates used to store predicted y-values for samples in the prediction set
XPREDICTIONS = pointers Pointer to variates containing data for the independent variables in the prediction set
ESTIMATES = matrices An nX+1 by nY matrix (where nX and nY are the number of variates contained in X and Y, respectively) to store the PLS regression coefficients
FITTEDVALUES = pointers Pointer to variates used to store the fitted values for the Y variates
LEVERAGES = variates Variate to store the leverage that each sample has on the PLS model
PRESS = variates Variate used to store the Predictive Residual Error Sum of Squares for each dimension in the PLS model, available only if cross-validation has been selected
RSS = variates Variate to save residual sums of squares
YRESIDUALS = pointers Pointer to variates containing the residuals from the Y block after NROOTS predictive dimensions have been extracted, uncorrected for any scaling applied using STANDARDIZE
XRESIDUALS = pointers Pointer to variates containing the residuals from the X block after NROOTS predictive dimensions have been extracted, uncorrected for any scaling applied using STANDARDIZE
PCSCORES = matrices Matrix to save principal component scores
PCSAVE = pointers Pointer to save structures from the principal component analysis (by PCP) of the orthogonal X matrix
SAVE = pointers Pointer to save structures from the orthogonal projection

Description

OPLS performs orthogonal partial least squares (O-PLS) regression.

Variation in X that is orthogonal (i.e. uncorrelated) to Y may disturb PLS modelling, complicating the model interpretation. O-PLS combines PLS with a pre-processing step that filters out systematic variation in X, orthogonal to Y, that disturbs the PLS model. To improve model interpretation, the variation explained by each regular PLS component is partitioned into two parts:

1)   variation linearly related to Y (i.e. predictive) and

2)   variation orthogonal to Y.

The resulting O-PLS model takes the form:

X = TPT + TorthoPorthoT + E

Y = TCT + F

where T = XW and Tortho = XWortho. The predictive variation in X is modelled by the matrices T, W and P, whose columns contain the predictive component scores, loading weights and loadings, respectively. The orthogonal variation is modelled by analogous matrices Tortho, Wortho and Portho, whose columns contain the orthogonal component scores, loading weights and loadings, respectively. The columns of matrix C contain Y-loadings, and E and F are the residual matrices.

The number of predictive components used to model the predictive variation is specified by the NROOTS option; default 1. The number of orthogonal components used to model the orthogonal variation is specified by the NORTHOGONALROOTS option; default 1. The OPLS procedure also enables the orthogonal variation to be further explored, through principal components analysis.

In practice, the OPLS procedure removes Y-orthogonal variation from X to form a filtered X matrix (Xfiltered). A PLS model is then fitted to Xfiltered, using the PLS procedure.

The dependent and independent variates are supplied using the Y and X parameters, respectively, as pointers containing a variate for each dimension. The Y and X variates must not contain missing values. A pointer of variates containing new X data, for which predictions are desired, can be specified by the XPREDICTIONS parameter. Sample labels for X and XPREDICTIONS can be provided by using the LABELS and PLABELS options, respectively.

The STANDARDIZE option controls whether the Y, X and the filtered X variables are standardized to mean zero and unit variance prior to analysis. The Y variables are standardized prior to orthogonal projection and PLS analysis, the X variables are standardized prior to orthogonal projection, and the filtered X variables are standardized prior to modelling by PLS. By default, none of these are standardized. Note, however, that all variables are automatically centred prior to the PLS analysis, even if no standardization is requested.

The SAVE parameter can supply a pointer to store structures from orthogonal projection. The labels of the pointer, and their corresponding information, are as follows:

    w_ortho orthogonal component loading weights,
    t_ortho orthogonal component scores
    p_ortho orthogonal loadings,
    X_filtered filtered X matrix, with the orthogonal variation removed,
    X_ortho matrix containing the orthogonal variation,
    Xpred_filtered filtered prediction X matrix, with the orthogonal variation removed,
    Xpred_ortho matrix containing the orthogonal variation of the prediction X matrix.

The NGROUPS and SEED options control cross-validation by the PLS procedure. The parameters YLOADINGS, XLOADINGS, PLOADINGS, YSCORES, XSCORES, B, YPREDICTIONS, ESTIMATES, FITTEDVALUES, LEVERAGES, PRESS, RSS, YRESIDUALS and XRESIDUALS allow output from the PLS procedure to be saved (i.e. from modelling the predictive variation).

Printed output is controlled by the PRINT option. Almost all of the settings are the same as those of the PLS procedure, and are used in exactly the same way. However, there is an additional setting, summary, which summarizes the percentage of variation in X explained by each orthogonal and predictive (i.e. PLS) component.

You can set the PCMETHOD option to request a principal component analysis to decompose the matrix of orthogonal variation (see X_ortho above), and to specify the method to use. Its settings are the same as those of the METHOD option of the PCP directive. Printed output is controlled by the PCPRINT option, which operates exactly as the PRINT option of the PCP directive. The PCSAVE parameter can supply a pointer to store details from the analysis. You can set option PLOT = pcplot to produce a score plot; by default, no plot is produced. When NORTHOGONALROOTS = 1, the WINDOW option can be used to control the window to used for the plot; default 3.

Options: PRINT, PCPRINT, PLOT, NORTHOGONALROOTS, NROOTS, STANDARDIZE, NGROUPS, SEED, LABELS, PLABELS, PCMETHOD, WINDOW.

Parameters: Y, X, YLOADINGS, XLOADINGS, PLOADINGS, YSCORES, XSCORES, B, YPREDICTIONS, XPREDICTIONS, ESTIMATES, FITTEDVALUES, LEVERAGES, PRESS, RSS, YRESIDUALS, XRESIDUALS, PCSCORES, PCSAVE, SAVE.

Method

OPLS uses the methodology of Trygg & Wold (2002), applying the algorithm described in Biagioni et al. (2011), to remove variation from X that is not correlated to Y. OPLS then calls the PLS procedure to fit a PLS model to the filtered (i.e. pre-treated) matrix with the orthogonal variation removed.

To perform the principal components analysis on the matrix of orthogonal variation, OPLS uses the PCP directive, taking the setting for its METHOD option from the PCMETHOD option, and the setting for its NROOTS option from the NORTHOGONALROOTS option. When there is only one root, the score plot, which can be requested by setting option PLOT = pcplot, is produced by the DOTHISTOGRAM procedure. When there are several roots, it is produced by the DMSCATTER procedure. If the XPREDICTIONS parameter is set, principal component scores for the samples in the prediction set are estimated as described by Trygg & Wold (2002), and plotted in red.

Action with RESTRICT

OPLS will work with restricted variates, fitting an O-PLS model to the subset of objects formed by the restriction. The subset can be defined by restricting any of the X or Y variates. However, if more than one variate is restricted, they must be be restricted in the same way. Note that the unrestricted length of all of the data variates must be the same, and the number of samples in the restricted subset must be at least three. Any restrictions on a text supplied for the LABELS option, or on a factor for the SEED option, are ignored.

When restricted data are supplied, and LABELS are also given, the appropriate subset of labels will appear in the output; if LABELS are not defined, then default labels reflecting the position in the restricted data are used.

No restrictions are allowed on the variates supplied by the XPREDICTIONS parameter, or on the text supplied by the PLABELS option.

References

Biagioni, D.J., Astling, D.P., Graf, P. & Davis, M.F. (2011). Orthogonal projects to latent structures solutions properties for chemometrics and systems biology. Journal of Chemometrics, 25, 514-525.

Trygg, J. & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119-128.

See also

Directives: PCP, SVD.

Procedure: PLS.

Commands for: Multivariate and cluster analysis, Regression analysis.

Example

CAPTION 'OPLS example',!t('The data are 24 calibration samples used to',\
        'determine the protein content of wheat from spectroscopic readings',\
        'at six different wavelengths.'),\
        !t('Fearn, T. (1983), Applied Statistics, 32, 73-79.');\
        STYLE=meta,plain,plain
VARIATE [NVALUES=24] L[1...6],%Protein
READ    L[1...6],%Protein
468 123 246 374 386 -11  9.23   458 112 236 368 383 -15  8.01
457 118 240 359 353 -16 10.95   450 115 236 352 340 -15 11.67
464 119 243 366 371 -16 10.41   499 147 273 404 433   5  9.51
463 119 242 370 377 -12  8.67   462 115 238 370 353 -13  7.75
488 134 258 393 377  -5  8.05   483 141 264 384 398  -2 11.39
463 120 243 367 378 -13  9.95   456 111 233 365 365 -15  8.25
512 161 288 415 443  12 10.57   518 167 293 421 450  19 10.23
552 197 324 448 467  32 11.87   497 146 271 407 451  11  8.09
592 229 360 484 524  51 12.55   501 150 274 406 407  11  8.38
483 137 260 385 374  -3  9.64   491 147 269 389 391   1 11.35
463 121 242 366 353 -13  9.70   507 159 285 410 445  13 10.75
474 132 255 376 383  -7 10.75   496 152 276 396 404   6 11.47 :
" Extract two orthogonal components before fitting a one dimensional PLS model
  to the standardized data with leave-one-out cross-validation.
  Principal components analysis is performed on the orthogonal variation."
OPLS [PRINT=summary,estimate,xpercent,ypercent,xloadings,yloadings,ploadings;\
     PCPRINT=loadings,roots,scores,tests; PLOT=pcplot; NORTHOGONALROOTS=2;\
     NROOTS=1; STANDARDIZE=X,Y; NGROUPS=24; SEED=38639; PCMETHOD=correlation]\
     Y=!p(%Protein); X=L
Updated on March 7, 2019

Was this article helpful?