GENPROCRUSTES procedure

Performs a generalized Procrustes analysis (G.M. Arnold & R.W. Payne).

Options

`PRINT` = string tokens	Printed output required (`analysis`, `centroid`, `column`, `individual`, `monitoring`); default `anal`, `cent`
`SCALING` = string token	Type of scaling to use (`none`, `isotropic`, `separate`); default `none`
`METHOD` = string token	Method to be used (`Gower`, `TenBerge`); default `Gowe`
`NROOTS` = scalar	Number of roots (i.e. dimensions) to print for the output configurations, consensus and rotation matrices, and number of dimensions to save with the `XOUTPUT`, `CONSENSUS` and `ROTATIONS` paramaters if their matrices have alread not been defined; default is to print and save all the dimensions
`PLOT` = string tokens	Controls which graphs to display (`consensus`, `individuals`, `projections`); default `*` i.e. none
`NDROOTS` = scalar	Number of dimensions to display in the consensus and individuals plots; default 3
`TOLERANCE` = scalar	The algorithm is assumed to have converged when (last residual sum of squares) – (current residual sum of squares) < `TOLERANCE` × (number of configurations); default 0.00001
`MAXCYCLE` = scalar	Limit on number of iterations; default 50

Parameters

`XINPUT` = pointers	Each pointer points to a set of matrices holding the original input configurations
`XOUTPUT` = pointers	Each pointer points to a set of matrices to store a set of final (output) configurations
`CONSENSUS` = matrices	Stores the final consensus configuration from each analysis
`ROTATIONS` = pointers	Each pointer points to a set of matrices to store the rotations required to transform each set of `XINPUT` configurations to their final (scaled) `XOUTPUT` configurations
`RESIDUALS` = pointers	Each pointer points to a set of matrices to store the distances of a set of scaled `XINPUT` configurations from its consensus
`RSS` = scalars	Stores the residual sum of squares from each analysis
`ROOTS` = diagonal matrices	Stores the latent roots from referring the centroid configuration to its principal axis form (consensus) for each analysis
`WSS` = scalars	Stores the initial within-configuration sum of squares from each analysis
`SCALINGFACTOR` = variates	Stores the isotropic scaling factors for configurations from each analysis
`PROJECTIONS` = pointers	Each pointer points to a set of matrices to store a set of projection matrices

Description

An N × V matrix represents a configuration of N points in V dimensions. Given a set of M such matrices (XINPUT), a generalized Procrustes analysis iteratively matches them to a common centroid configuration by the operations of translation to a common origin, rotation/reflection of axes and possibly also scale changes. This matching seeks to minimise the sum of the squared distances between the centroid and each individual configuration summed over all points (the Procrustes statistic for each configuration and the centroid, summed over all configurations). The final centroid is referred to principal axes to give a unique consensus configuration. Two methods of scaling are available (controlled by the SCALING option). Isotropic scaling, which scales the all the dimensions of each configuration by an equal amount, takes place during the Procrustes analysis. The alternative is to scale each configuration prior to the analysis so that the trace of each matrix is one (see Arnold 1992). If this latter method is used, the subsequent residuals represent pure lack-of-fit and the scaling factors given in the results represent differences in relative size/spread of the original (centred) configurations, whereas for overall isotropic scaling the scaling factor contains components of both size and lack-of-fit.

Procedure GENPROCRUSTES carries out a generalized Procrustes analysis and has parameters for saving various results for future use (XOUTPUT, CONSENSUS, ROTATIONS, RESIDUALS, RSS, ROOTS, WSS, SCALINGFACTOR, PROJECTIONS). There are options for different methods to use for the matching (SCALING, METHOD), control of convergence (TOLERANCE, MAXCYCLE) and printing and plotting of results (PRINT, PLOT, NROOTS and NDROOTS).

Note that the special case of M=2 corresponds to the classical pairwise Procrustes matching (ROTATE directive) except that by fitting each configuration to a common centroid the requirement to regard one of the initial configurations as fixed is obviated.

Options: PRINT, SCALING, METHOD, NROOTS, PLOT, NDROOTS, TOLERANCE, MAXCYCLE.

Parameters: XINPUT, XOUTPUT, CONSENSUS, ROTATIONS, RESIDUALS, RSS, ROOTS, WSS, SCALINGFACTOR, PROJECTIONS.

Method

The default method used for generalized Procrustes analysis in GENPROCRUSTES is that described by Gower (1975). Each input configuration (XINPUT – referred to henceforth as X_i, i=1…M) is initially column-centred, with the individual column means for each configuration optionally printed (by including the column setting with the PRINT option). If separate scaling is requested (option SCALING=separate), the matrices are also scaled to have trace one (see Arnold 1992). A constraint is required on the overall sum of squares to prevent the trivial solution of matching by all configurations collapsing to the origin. In this procedure the constraint used is

∑ ( trace ( X_i′ X_i ) ) = M.

An initial estimate of the centroid is found from these centred and scaled configurations; firstly X₂ is rotated to X₁, with the rotated X₂ saved as the new X₂ and the centroid computed as the mean of X₁ and the new X₂; X₃ is rotated to this centroid which is then recalculated as the mean of the three current configurations; and so on until all configurations X_i (i=1…M) have been included. The centroid thus found is taken as the initial centroid estimate Y, with the rotated values as the new X_i. The initial residual sum of squares S_r is calculated as

Sr = M × ( 1 – trace ( Y′ Y )).

Each of the current configurations X_i is then rotated to Y and the rotated position saved as the new X_i. The updated estimate of the centroid Y_n is calculated as the mean of the new X_i (i=1…M) and the new residual sum of squares calculated as

Sr_n = M × ( 1 – trace ( Y_n′ Y_n )).

If isotropic scaling has been requested (option SCALING=isotropic) new estimates ro_i′ of the individual scaling factors ro_i (originally set to 1) are now found by

ro_i′/ro_i= √( trace( X_i′Y_n )/( trace( X_i′X_i ) × trace( Y_n′Y_n )))

and each X_i is updated by a factor of ro_i′/ro_i. The centroid is then recalculated as the mean of the new X_i and the new residual sum of squares calculated in a similar manner to before. If the change in residual Sr is less than a preset tolerance (controlled by option TOLERANCE) the algorithm is taken to have converged. If not, the process is repeated until the tolerance is reached, up to a maximum number of iterations as set by the option MAXCYCLE (default 50) after which a message of non-convergence is printed and the procedure terminated. Monitoring information about convergence can be printed by including the monitoring setting with the PRINT option.

After convergence a unique consensus configuration is found by referring the final centroid to principal axes; the corresponding latent roots may be saved using the ROOTS parameter. Final results for the consensus and individual configurations (referred to the same principal axes) may be printed using the centroid and individual settings of the PRINT option, and/or saved using the parameters XOUTPUT, CONSENSUS and ROTATIONS. By default, results are presented and saved for the maximum available dimensionality but the option NROOTS allows a reduced number of dimensions to be set. Analysis of variation for the M configurations (including the individual scaling factors) and for the N points, along with the initial within and between configurations sums of squares (WSS and BSS), the final residual sum of squares (RSS) and number of steps in the iteration process may be printed using the analysis setting of the PRINT option. The initial within-configuration sum of squares, final residual sum of squares and individual isotropic scaling factors may also be saved using, respectively, the WSS, RSS and SCALINGFAC parameters. (Note that the final results are still scaled by the original factor from the initial overall constraint; to return to the original scale all sums of squares need adjustment by a factor of WSS/M and configurations by the square root of that factor).

Independently of the choice of dimensionality for printing and saving, the NDROOTS option controls the dimensionality of the graphical output requested using the PLOT option (default 3). The consensus setting plots the consensus solution in the chosen dimensionalty, and the individual setting gives the individual final configurations as well as the consensus. The projection setting displays the projections (calculated from the individual rotation matrices scaled by the singular values from the consensus solution in principal axis form) as vectors labelled by configuration number and colour-coded for order of column. This projection plot can be particularly helpful in comparing the use of terms/attributes (columns of the configurations) by individual assessors in sensory analysis, both in conventional and free-choice profiling; see Arnold & Collins (1993) for further details.

Modifications to the method described above are given in TenBerge (1975), and may be invoked by the TenBerge setting of the METHOD option. This may give considerable savings in the time to reach convergence (Arnold 1988).

References

Arnold, G.M. (1988). Comparisons of algorithms for generalized Procrustes analyses. Genstat Newsletter, 22, 7-11.

Arnold, G.M. (1992). Scaling factors in generalized Procrustes analysis. Computational Statistics, Volume 1, Proceedings of the 10th Symposium on Computational Statistics, COMPSTAT, Neuchatel, Switzerland, August 1992, 61-66.

Arnold, G.M. & Collins, A.J. (1993). Interpretation of transformed axes in multivariate analysis. Applied Statistics, 42, 381-400.

Gower, J.C. (1975). Generalized Procrustes analysis. Psychometrika, 40, 33-51.

TenBerge, J.M.F. (1977). Orthogonal Procrustes rotation for two or more matrices. Psychometrika, 42, 267-276.

Example

CAPTION  'GENPROCRUSTES example',!t('Data from',\
         'Gower (1975), Psychometrika, 40, pages 33-51.',\ 
         'Note, however, that in Table 3 the scaling factors printed',\ 
         'were SQRT(ro[i]) instead of ro[i], and in Table 4 the',\ 
         'Between and Within Judges sums of squares were transposed.');\ 
         STYLE=meta,plain
MATRIX   [ROWS=9; COLUMNS=7] X[1...3]
READ     [SERIAL=yes] X[]
47 44 49 38 35 40 40
72 45 41 77 72 73 35
61 49 40 58 58 62 30
66 56 45 55 53 46 30
37 72 50 27 30 33 25
76 76 53 81 79 75 45
64 59 51 72 61 66 40
21 70 43 27 22 26 20
71 70 34 72 72 71 35 :
31 39 33 29 48 38 42
30 60 36 22 36 34 39
27 55 30 18 28 22 42
48 52 53 27 21 30 31
20 55 28 22 33 27 35
21 42 31 46 76 33 42
30 52 53 35 44 30 44
 5 57 53 12 13  6 31
55 63 53 77 79 57 49 :
43 46 44 22 53 44 29
53 79 75 79 73 52 27
22 85 83 19 27 17 22
28 89 78 13 29 20 24
75 86 85 34 75 55 38
53 79 82 72 78 74 38
15 85 85 46 75 52 35
 5 95 95  3 20  2 24
27 78 85 89 92 81 41 :
GENPROCRUSTES [PRINT=analysis,centroid,column,individual,monitoring;\ 
              SCALING=isotropic] XINPUT=X

Updated on March 7, 2019

Was this article helpful?

Yes No