Performs canonical variates analysis.
|Printed output required (
||Number of latent roots for printed output; default
||Whether to print the smallest roots instead of the largest (
||Within-group sums of squares and products, means etc (input for the analyses)|
||Saves loadings, roots and trace from each analysis|
||Saves canonical variate means|
||Saves distances of the means from the dimensions fitted in each analysis|
||Saves inter-group-mean Mahalanobis distances|
||Saves the adjustment terms|
||Saves details of the analysis; if unset, an unnamed save structure is saved automatically (and this can be accessed using the
You specify the input for
CVA using its first parameter,
WSSPM, this may contain a list of structures, in which case Genstat repeats the analysis for each of them. The input must be an SSPM structure, declared with the
GROUPS option of the
SSPM directive set to a factor giving the grouping of the units. If the variates used to form this SSPM structure are restricted, then the SSPM is restricted in the same way, and so the
CVA directive takes account of the restriction. The SSPM contains information on the within-group sums of squares and products, pooled over all the groups; it also contains the group means and group sizes, from which Genstat can derive the between-group sums of squares and products.
CVA finds linear combinations of the original variables that maximize the ratio of between-group to within-group variation, thereby giving functions of the original variables that can be used to discriminate between the groups. The squares of the printed distances between group means are Mahalanobis D2 statistics when all the dimensions are used; otherwise they are approximations. You can form exact Mahalanobis distances with the
The three options of the
CVA directive control the printed output. By default there is no printed output, and so you should set the
SMALLEST options of
NROOTS specifies the number of roots for which you want the results to be printed. By default these will be the largest roots, unless you set
SMALLEST=yes; then the results will be printed for the smallest non-zero roots. When you print a subset of the results, residuals can be formed and printed from the dimensions that are not displayed.
The significance tests that are printed are for a significant dimensionality greater than k, that is for the joint significance of the first, second, …, (k+1)th latent roots. This test is printed for k=0, 1, … min(g-1, v)-1. If the test is “not significant” for k=r, then the values of chi-square for k>r should be ignored as the indication is that the remaining dimensions have no interesting structure. The test statistic (Bartlett 1938) is asymptotically distributed as chi-square with (v–k)×(g–k-1) degrees of freedom. Here n is the number of units, g is the number of groups, v is the number of variables, and li is the ith latent root. If the coefficient [n–g-½(v–g)] is less than zero, there are too few units for the statistics to be calculated and a message is printed to this effect. In any case, the tests should be treated with caution unless n–g is very much larger than v.
The latent vectors, or loadings, are scaled in such a way that the average within-group variability in each canonical variate dimension is 1: thus the within-group variation is equally represented in each dimension. Since the latent roots are the successive maxima of the ratio of between-group to within-group variation, loadings corresponding to roots less than 1 are for dimensions in the canonical variate space that exhibit more within-group variation than between-group variation.
The scores for the means are arranged so that their centroid, weighted by group size, is at the origin. This is done by subtracting a constant term, for each canonical variate dimension, from the scores initially formed as a linear combination of the group means of the original variables. These adjustments can be saved, in a matrix of size one by number of groups, using the
If you ask for distances, they are formed from the group mean scores for the canonical variate dimensions that are printed. If results are printed for the full dimensionality, the distances will be Mahalanobis distances between the groups.
LRV parameter allows you to save the loadings, latent roots and their sum (the trace) in an LRV structure, while the
SCORES parameter saves the canonical variate means. If you have declared the LRV already, its number of rows must be the same as the number of variates involved in forming the input SSPM. The number of rows of the
SCORES matrix, if previously declared, must be equal to the number of groups.
The number of columns of the LRV and of the
SCORES matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, Genstat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option
SMALLEST retains the default setting
no, Genstat takes the number of columns from the setting of the
NROOTS option. Otherwise, Genstat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved. Procedure
LRVSCREE can be used to produce a “scree” diagram which can be helpful in deciding how many dimensions to save.
RESIDUALS parameter allows you to save the distances of the means from the dimensions fitted in the analysis in a matrix with number of rows equal to the number of groups and one column. If the latent roots and vectors (loadings) are saved from the analysis, the residuals will correspond to the dimensions not saved; the same applies if you save scores. If neither the LRV nor scores are saved, the saved residuals will correspond to the smallest latent roots not printed.
DISTANCES parameter allows you to save the inter-group-mean Mahalanobis distances in a symmetric matrix.
SAVE parameter can supply a pointer to save a multivariate save structure contining all the details of the analysis. If this is unset, an unnamed save structure is saved automatically (and this can be accessed using the
GET directive). Alternatively, you can set
SAVE=* to prevent any save structure being formed if, for example, you have a very large data set and want to avoid committing the storage space.
Bartlett, M.S. (1938). Further aspects of the theory of multiple regression. Proceedings of the Cambridge Philosophical Society, 34, 33-40.
Commands for: Multivariate and cluster analysis.
" Genstat example CVA-1: Canonical Variates Analysis The data for this example deal with measurements made on 28 brooches found at the archaeological site of the cemetry at Musingen. Seven measurements are used and have been transformed by taking logarithms. A grouping factor, obtained from a cluster analysis, with four levels has also been included. (Doran and Hodson, Mathematics and computers in archaeology. (1975)) " " Declare the four-level grouping factor. " FACTOR [LEVELS=4; VALUES=3,1,2,2,2,1,1,4,2,3,3,4,2,2,2,2,2,4,\ 1,3,4,4,2,2,2,1,1,3] Groupno " The data are held in the file 'CVA-1.DAT' and names for the data columns are on the first line. Read the file, saving the names in a pointer structure called Data. " FILEREAD [NAME='%gendir%/examples/CVA-1.DAT';\ IMETHOD=read; MAXCATEGORY=4; ISAVE=Data] " Declare a sums of squares and products data structure called W. The sums of squares and products have to be calculated for our pointer of measurement variates, with groups for within-group SSPMs specified by the grouping factor Groupno. Form the structure W. " SSPM [TERMS=Data; GROUPS=Groupno] W FSSPM W " Perform the canonical variates analysis for W, printing out the resulting roots, loadings, the means for the canonical variate groups, the values for the significance tests for the latent roots and the distances between groups. " CVA [PRINT=roots,loadings,means,tests,distances] W " Carry out the analysis once again, saving the latent roots, loadings and trace to the pointer L, and the means to Meanscrs. " CVA [PRINT=residuals,distances; NROOTS=2] WSSPM=W; LRV=L; SCORES=Meanscrs PRINT L,Meanscrs " If required, the smallest roots can be requested instead of the largest. " CVA [PRINT=roots,residuals; NROOTS=2; SMALLEST=yes] W; LRV=L PRINT L