Performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in
CVA) as special cases.
|Printed output required (
||Number of latent roots for printed output; default
||Whether to print the smallest roots instead of the largest (
||These can be specified either as a symmetric matrix of similarities or transformed distances or, for the canonical variates analysis, as an
||Latent vectors (i.e. coordinates or scores), roots and trace from each analysis|
||Squared distances of the units from their centroid|
||Distances of the units from the fitted space|
||Principal component loadings, or canonical variate loadings|
||Computed inter-unit distances calculated from the variates of a data matrix, or inter-group Mahalanobis distances calculated from a within-group SSPM|
||Saves details of the analysis; if unset, an unnamed save structure is saved automatically (and this can be accessed using the
There are six sections of output from
PCO, requested using the
SMALLEST options control the printed output of roots, scores, loadings and residuals. By default, results are printed for all the roots, but you can set the
NROOTS option to specify a lesser number. If option
SMALLEST has the default setting
no these are taken to be the largest roots, but if you set
SMALLEST=yes the results are for the smallest non-zero roots. The inter-unit distances are unaffected by the setting of the
DATA parameter supplies the data. In its simplest form,
PCO works on a symmetric matrix, with values giving the associations amongst a set of objects. This could, for example, be a similarity matrix produced by
Alternatively, the input to
PCO can be a pointer whose values are the identifiers of a set of variates, or a matrix storing the variates by columns. Now the
PCO directive will construct the matrix of inter-unit squared distances, and will base the analysis on associations derived from this. This is equivalent to a principal components analysis; however, the results are derived by analysing the distance matrix rather than an SSPM. When there are more units than variates, using
PCO for principal components analysis is less efficient than using the
PCP directive; however, if there are more variates than units the
PCO directive is more efficient. When
PCO is used for principal components analysis, all the variates must be of the same length and none of their values may be missing; any restrictions on the variates are ignored.
The third type of input to
PCO is an SSPM structure. This must be a within-group SSPM: that is, you must have set the
GROUP option of the
SSPM directive when the SSPM was declared. Now the
PCO directive will calculate the Mahalanobis distances amongst the group means, and base the analysis on them. This will give results similar to a canonical variates analysis. The representation of distances will be better than that of
CVA will be better if you are interested in loadings for discriminatory purposes.
The second and subsequent parameters of
PCO allow you to save the results. The number of units that determine the sizes of the output structures differs according to the input to
PCO. For a matrix or a symmetric matrix the number of units is the number of rows of the matrix, for a pointer it is the number of values in the variates that the pointer contains, while for an SSPM the number of units is the number of groups.
The latent roots, scores and trace can be saved in an LRV structure using the
LRV parameter. If you have declared the LRV already, its number of rows must equal the number of units.
If the input to
PCO is a pointer, a matrix, or an SSPM, the principal component or canonical variate loadings can be saved in a matrix using the
LOADINGS parameter. The number of rows of the matrix is equal to the number of variates (either those specified by an input pointer or those specified in the
SSPM directive for an input SSPM structure), or the number of columns in an input matrix.
The number of columns of the LRV and of the
LOADINGS matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, Genstat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option
SMALLEST retains the default setting
no, Genstat takes the number of columns from the setting of the
NROOTS option. Otherwise, Genstat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved.
The distances of the units from their centroid can be saved in a diagonal matrix using the
CENTROID parameter. The diagonal matrix has the same number of rows as the number of units, defined above. The
RESIDUALS parameter allows you to save residuals, formed from the dimensions that have not been saved, in a matrix with one column and number of rows equal to the number of units. Finally, the inter-unit distances can be saved in a symmetric matrix using the
DISTANCES parameter. The number of rows of the symmetric matrix is again the same as the number of units.
SAVE parameter can supply a pointer to save a multivariate save structure contining all the details of the analysis. If this is unset, an unnamed save structure is saved automatically (and this can be accessed using the
GET directive). Alternatively, you can set
SAVE=* to prevent any save structure being formed if, for example, you have a very large data set and want to avoid committing the storage space.
Having obtained an ordination, you may sometimes want to add points to the ordination for additional units. If you know the squared distances of the new units from the old, the technique of Gower (1968) can be used to add points to the ordination for the new units. You can do this in Genstat by using the
PCO ignores any restrictions on the
Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika, 55, 582-585.
Commands for: Multivariate and cluster analysis.
" Genstat example PCO-1: Principal coordinates analysis. The data for this example (Nathanson J A 1971. An aplication of multivariate analysis in astronomy. Applied Statistics 20, 239-249) gives squared distances amongst ten types of galaxy: those of an elliptical shape, eight different kinds of spiral galaxy , and irregularly-shaped galaxies. The spiral types vary from those which are mailnly made up of a central core (coded as types SO and SBO) to those that are extremely tenuous (Sc and SBc). This example forms an ordination of the ten galaxy types. " " Declare the symmetric data matrix " SYMMETRIC [ROWS=!T(E,SO,SBO,Sa,SBa,Sb,SBb,Sc,SBc,I)] Galaxy READ Galaxy 0 1.87 0 2.24 0.91 0 4.03 2.05 1.51 0 4.09 1.74 1.59 0.68 0 5.38 3.41 3.15 1.86 1.27 0 7.03 3.85 3.24 2.25 1.89 2.02 0 6.02 4.85 4.11 3.00 2.13 1.71 1.45 0 6.88 5.70 5.12 3.72 3.01 2.97 1.75 1.13 0 4.12 3.77 3.86 3.93 3.27 3.77 3.52 2.79 3.29 0 : PRINT Galaxy CALCULATE Galaxy = -Galaxy/2 " Carry out the principal coordinates analysis, printing out the latent roots and trace, the principal coordinate scores, the distances of each unit from their overall centroid, and the matrix of inter-unit distances. " PCO [PRINT=roots,scores,centroid,distances] Galaxy " Carry out the analysis once again, printing information for the 8 smallest roots only. " PCO [PRINT=residuals,centroid; NROOTS=8; SMALLEST=yes] Galaxy " Create two different data matrices: Gname8 - which holds the data corresponding to the eight spiral galaxies. This is created from taking row 2 to column 2, to row 9, column 9 of the symmetric matrix Galaxy. Corresponding row labels are supplied. Gname2 - which holds the data corresponding to the elliptical and irregularly-shaped galaxies. This is created from taking the values in the Galaxy matrix from row 1, columns 2 to 9, and row 10, columns 2 to 9. Again, appropriate labels are supplied. " TEXT Gname8; !T(SO,SBO,Sa,SBa,Sb,SBb,Sc,SBc) & Gname2; !T(E,I) SYMMETRIC [ROWS=Gname8] G8 CALCULATE G8 = Galaxy$[!(2...9)] MATRIX [ROWS=Gname2; COLUMNS=Gname8] G2 CALCULATE G2 = Galaxy$[!(1,10); !(2...9)] " Transform the matrix back to the original scale. " CALCULATE G2 = -2*G2 PRINT G2; FIELDWIDTH=7 " Perform the analysis for the eight spiral galaxies, saving the latent vectors in the LRV structure L8, and the centroid distances in the diagonal matric C8. Their is no need to declare these structures in advance since the PCO will do this automatically. " PCO [PRINT=roots,scores] G8; LRV=L8; CENTROID=C8 " Now add the points for the elliptical and irregularly shaped galaxies to the principal coordinate analysis. " ADDPOINTS [PRINT=coordinates,residuals] G2; LRV=L8; CENTROID=C8