PCO directive

Performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.

Options

`PRINT` = string tokens	Printed output required (`roots, scores, loadings, residuals, centroid, distances`); default `*` i.e. no printing
`NROOTS` = scalar	Number of latent roots for printed output; default `*` requests them all to be printed
`SMALLEST` = string token	Whether to print the smallest roots instead of the largest (`yes, no`); default `no`

Parameters

`DATA` = identifiers	These can be specified either as a symmetric matrix of similarities or transformed distances or, for the canonical variates analysis, as an `SSPM` containing within-group sums of squares and products etc or, for principal components analysis, either as a pointer containing the variates of the data matrix or as a matrix storing the variates by columns
`LRV` = LRVs	Latent vectors (i.e. coordinates or scores), roots and trace from each analysis
`CENTROID` = diagonal matrices	Squared distances of the units from their centroid
`RESIDUALS` = matrices or variates	Distances of the units from the fitted space
`LOADINGS` = matrices	Principal component loadings, or canonical variate loadings
`DISTANCES` = symmetric matrices	Computed inter-unit distances calculated from the variates of a data matrix, or inter-group Mahalanobis distances calculated from a within-group SSPM
`SAVE` = pointers	Saves details of the analysis; if unset, an unnamed save structure is saved automatically (and this can be accessed using the `GET` directive)

Description

The PCO directive is used for principal coordinates analysis. This method encompasses principal components analysis and a form of canonical variates analysis as special cases as explained above.

There are six sections of output from PCO, requested using the PRINT option:

`roots`	prints the latent roots and trace;
`scores`	prints the principal coordinate scores;
`loadings`	when the directive is being used for principal components analysis or canonical variates analysis, this specifies that the loadings from the analysis are to be printed;
`residuals`	prints the residuals, this is relevant only if results are to be printed corresponding to only some of the latent roots;
`centroid`	prints the distances (not squared distances) of each unit from their overall centroid;
`distances`	prints the matrix of inter-unit distances (not squared distances).

The NROOTS and SMALLEST options control the printed output of roots, scores, loadings and residuals. By default, results are printed for all the roots, but you can set the NROOTS option to specify a lesser number. If option SMALLEST has the default setting no these are taken to be the largest roots, but if you set SMALLEST=yes the results are for the smallest non-zero roots. The inter-unit distances are unaffected by the setting of the NROOTS option.

The DATA parameter supplies the data. In its simplest form, PCO works on a symmetric matrix, with values giving the associations amongst a set of objects. This could, for example, be a similarity matrix produced by FSIMILARITY.

Alternatively, the input to PCO can be a pointer whose values are the identifiers of a set of variates, or a matrix storing the variates by columns. Now the PCO directive will construct the matrix of inter-unit squared distances, and will base the analysis on associations derived from this. This is equivalent to a principal components analysis; however, the results are derived by analysing the distance matrix rather than an SSPM. When there are more units than variates, using PCO for principal components analysis is less efficient than using the PCP directive; however, if there are more variates than units the PCO directive is more efficient. When PCO is used for principal components analysis, all the variates must be of the same length and none of their values may be missing; any restrictions on the variates are ignored.

The third type of input to PCO is an SSPM structure. This must be a within-group SSPM: that is, you must have set the GROUP option of the SSPM directive when the SSPM was declared. Now the PCO directive will calculate the Mahalanobis distances amongst the group means, and base the analysis on them. This will give results similar to a canonical variates analysis. The representation of distances will be better than that of CVA, but CVA will be better if you are interested in loadings for discriminatory purposes.

The second and subsequent parameters of PCO allow you to save the results. The number of units that determine the sizes of the output structures differs according to the input to PCO. For a matrix or a symmetric matrix the number of units is the number of rows of the matrix, for a pointer it is the number of values in the variates that the pointer contains, while for an SSPM the number of units is the number of groups.

The latent roots, scores and trace can be saved in an LRV structure using the LRV parameter. If you have declared the LRV already, its number of rows must equal the number of units.

If the input to PCO is a pointer, a matrix, or an SSPM, the principal component or canonical variate loadings can be saved in a matrix using the LOADINGS parameter. The number of rows of the matrix is equal to the number of variates (either those specified by an input pointer or those specified in the SSPM directive for an input SSPM structure), or the number of columns in an input matrix.

The number of columns of the LRV and of the LOADINGS matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, Genstat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option SMALLEST retains the default setting no, Genstat takes the number of columns from the setting of the NROOTS option. Otherwise, Genstat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved.

The distances of the units from their centroid can be saved in a diagonal matrix using the CENTROID parameter. The diagonal matrix has the same number of rows as the number of units, defined above. The RESIDUALS parameter allows you to save residuals, formed from the dimensions that have not been saved, in a matrix with one column and number of rows equal to the number of units. Finally, the inter-unit distances can be saved in a symmetric matrix using the DISTANCES parameter. The number of rows of the symmetric matrix is again the same as the number of units.

The SAVE parameter can supply a pointer to save a multivariate save structure contining all the details of the analysis. If this is unset, an unnamed save structure is saved automatically (and this can be accessed using the GET directive). Alternatively, you can set SAVE=* to prevent any save structure being formed if, for example, you have a very large data set and want to avoid committing the storage space.

Having obtained an ordination, you may sometimes want to add points to the ordination for additional units. If you know the squared distances of the new units from the old, the technique of Gower (1968) can be used to add points to the ordination for the new units. You can do this in Genstat by using the ADDPOINTS directive.

Options: PRINT, NROOTS, SMALLEST.

Parameters: DATA, LRV, CENTROID, RESIDUALS, LOADINGS, DISTANCES, SAVE.

Action with `RESTRICT`

PCO ignores any restrictions on the DATA variates.

Reference

Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika, 55, 582-585.

Example

" Genstat example PCO-1: Principal coordinates analysis.

  The data for this example (Nathanson J A 1971. An aplication of
  multivariate analysis in astronomy. Applied Statistics 20, 239-249)
  gives squared distances amongst ten types of galaxy: those of an 
  elliptical shape, eight different kinds of spiral galaxy , and 
  irregularly-shaped galaxies. The spiral types vary from those which 
  are mailnly made up of a central core (coded as types SO and SBO) to 
  those that are extremely tenuous (Sc and SBc).

  This example forms an ordination of the ten galaxy types.
"
 
"
  Declare the symmetric data matrix
"
SYMMETRIC [ROWS=!T(E,SO,SBO,Sa,SBa,Sb,SBb,Sc,SBc,I)] Galaxy
READ Galaxy
0
1.87 0
2.24 0.91 0
4.03 2.05 1.51 0
4.09 1.74 1.59 0.68 0
5.38 3.41 3.15 1.86 1.27 0
7.03 3.85 3.24 2.25 1.89 2.02 0
6.02 4.85 4.11 3.00 2.13 1.71 1.45 0
6.88 5.70 5.12 3.72 3.01 2.97 1.75 1.13 0
4.12 3.77 3.86 3.93 3.27 3.77 3.52 2.79 3.29 0 :
PRINT Galaxy
CALCULATE Galaxy = -Galaxy/2

"
 Carry out the principal coordinates analysis, printing out the latent
 roots and trace, the principal coordinate scores, the distances of each 
 unit from their overall centroid, and the matrix of inter-unit distances.
"
PCO [PRINT=roots,scores,centroid,distances] Galaxy

"
  Carry out the analysis once again, printing information for the 8 
  smallest roots only.
"
PCO [PRINT=residuals,centroid; NROOTS=8; SMALLEST=yes] Galaxy

"
  Create two different data matrices:

       Gname8 - which holds the data corresponding to the eight spiral
                galaxies. This is created from taking row 2 to column 2,
                to row 9, column 9 of the symmetric matrix Galaxy. 
                Corresponding row labels are supplied.
       Gname2 - which holds the data corresponding to the elliptical and 
                irregularly-shaped galaxies. This is created from taking
                the values in the Galaxy matrix from row 1, columns 2 to 
                9, and row 10, columns 2 to 9. Again, appropriate labels
                are supplied.
"               
TEXT Gname8; !T(SO,SBO,Sa,SBa,Sb,SBb,Sc,SBc)
& Gname2; !T(E,I)
SYMMETRIC [ROWS=Gname8] G8
CALCULATE G8 = Galaxy$[!(2...9)]
MATRIX [ROWS=Gname2; COLUMNS=Gname8] G2
CALCULATE G2 = Galaxy$[!(1,10); !(2...9)]

"
  Transform the matrix back to the original scale.
"
CALCULATE G2 = -2*G2
PRINT G2; FIELDWIDTH=7

"
  Perform the analysis for the eight spiral galaxies, saving the latent 
  vectors in the LRV structure L8, and the centroid distances in the
  diagonal matric C8. Their is no need to declare these structures in 
  advance since the PCO will do this automatically.
"
PCO [PRINT=roots,scores] G8; LRV=L8; CENTROID=C8

" 
  Now add the points for the elliptical and irregularly shaped galaxies
  to the principal coordinate analysis.
"
ADDPOINTS [PRINT=coordinates,residuals] G2; LRV=L8; CENTROID=C8

Updated on June 19, 2019

Was this article helpful?

Yes No