MVAOD procedure

Does an analysis of distance of multivariate data (R.W. Payne & R.P. White).

Options

PRINT = string tokens Controls printed output (aodtable, permutationtest); default aodt
TERMS = formula Model terms to fit in the analysis; must be specified
FACTORIAL = scalar Limit on the number of factors or variates in a term for it to be included in the analysis; default 3
NTIMES = scalar Number of permutations to use in the permutation test; default 999
SEED = scalar Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically

Parameters

DATA = symmetric matrices Supplies the squared distances between the data points
SSD = variates Saves the sums of squared distances
DF = variates Saves the numbers of degrees of freedom
PRPERMUTATION = variates Saves probabilities from the permutation test
DISTANCES = pointers Contains a symmetric matrix of distances for each model term

Description

This procedure implements the analysis of multivariate distance devised by Gower & Krzanowski (1999). This is useful when you have units whose positions in multi-dimensional space may be explained by a linear statistical model. It provides a breakdown of the sums of squared distances between the units, similar to that provided for sums of squares in an analysis of variance. So, the total squared distance between the units is partitioned into the components that can be explained by each of the terms in the model. These cannot be tested directly as in an analysis of variance, as it is unclear what probability distributions would be appropriate. Instead the importance of the terms can be assessed by doing a permutation test, in which the several permutations of the units are made, and the significances of the sums of squared distances from the observed data are calculated by seeing where they lie in the distribution of values obtained from all the analyses (the original analysis and those of the permuted data sets).

The squared distances between the units must be supplied in a symmetric matrix, using the DATA parameter. In some situations, these may be actual distances. Alternatively, the units may often be described by a collection of attributes ranging from continuous measurements to categorical variables, like the presence or absence of a particular feature. In these circumstances, the FSIMILARITY directive can be used combine these attributes to give a symmetric matrix that represents the similarity between each pair of units. This can then be converted into a squared distance matrix, for example, by subtracting the similarities from one. (So MVAOD can be regarded as providing an alternative to multivariate analysis of variance, for units whose attributes are not all continuous variables.)

The model to fit in the analysis is specified by the TERMS option. The FACTORIAL option sets a limit on the number of factors of variates that the terms can contain; any terms with more factors of variates are deleted from the analysis.

Printed output is controlled by the PRINT option, with settings:

    aodtable for an analysis-of-distance table, giving the sums of squared distances and numbers of degrees of freedom for each model term; and
    permutationtest adds a column to the analysis-of-distance table containing probabilities from the permutation test.

The NTIMES option specifies the number of permutations to perform; the default is 999. The SEED option specifies the seed to use to generate the random numbers that are used to select the permutations; the default of zero continues the sequence of random numbers from a previous generation or, if none have yet been used in this Genstat job, it initializes the seed automatically. MVAOD checks whether NTIMES is greater than the number of possible permutations available for the data set. If so, it does an exact test instead, which uses each possible permutation once.

The SSD, DF and PRPERMUTATION parameters allow you to save the sums of squared distances, degrees of freedom and permutation probabilities. These are each saved in a variate, with each unit labelled by the name of the model term concerned. There are also two final units in each variate to save the corresponding information for residual and the total.

The DISTANCES parameter can save a pointer containing a symmetric matrix for each model term. Each matrix has a row for each combination of levels of the factors in the corresponding term, and its values are the distances between the factor combinations in the multi-dimensional space defined by the possible effects of the term. So, to investigate the relationships between the effects of the term, you could convert the DISTANCES to similarities, and then use them as input for a principal coordinates analysis (see PCO for details).

Options: PRINT, TERMS, FACTORIAL, NTIMES, SEED.

Parameters: DATA, SSD, DF, PRPERMUTATION, DISTANCES.

Method

The method of analysis is described by Gower & Krzanowski (1999) and Krzanowski (2002), who show that the sum of squares of distances for each term i is given by

TRACE( Proj[i] *+ DATA *+ Proj[i]) / 2

where Proj[i] is a projection matrix for the term. If the model contains only factors, MVAOD uses ANOVA to check whether the model is orthogonal and, if so, it calculates the projection matrices using the method described by Payne & Tobias (1992). For a non-orthogonal model, MVAOD adjusts the design matrix X[i] of each term i for the earlier terms by using its columns as y-variates in a regression analysis, fitting all the earlier terms, and then reforming the design matrix by replacing each column with the residuals from the corresponding regression. The projection matrix is then

X[i] *+ Ginverse(T(X[i] *+ X[i]) *+ T(X[i])

References

Gower, J.C. & Krzanowski, W.J. (1999) Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. Applied Statistics, 48, 505-519.

Krzanowski, W.J. (2002) Multifactorial analysis of distance in studies of ecological community structure. Journal of Agricultural, Biological and Ecological Statistics, 7, 222-232.

Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics, 19, 3-23.

See also

Directive: PCO.

Procedures: MANOVA, RMULTIVARIATE.

Commands for: Multivariate and cluster analysis.

Example

CAPTION      'MVAOD example',\
             !t('Analysis of distance of public bad data; see',\
             'Gower & Krzanowski 1999, Applied Statistics, 48, 505-519.');\
             STYLE=meta,plain
SPLOAD       FILE='%gendir%/examples/Publicbad.gsh'
" Form similarity matrix using city-block metric."
FSIMILARITY  [SIMILARITY=pbsimilarity] publicbad[]; TEST=cityblock
" Convert to squared distances."
CALCULATE    pbdistances = 1 - pbsimilarity
" Between-group analysis."
FACPRODUCT   [IMETHOD=include] !p(G,S,T,N); PRODUCT=group
MVAOD        [PRINT=aod; TERMS=group; NTIMES=99]\
             pbdistances; DISTANCES=groupdistances
" PCO analysis of between-group similarities (Gower & Krzanowski, Figure 3)."
PCO          [PRINT=roots] 1-groupdistances[1]; LRV=grouplrv
CALCULATE    groupscore[1,2] = grouplrv[1]$[*; 1,2]
FRAME        3; SCALING=xyequal
XAXIS        3; YORIGIN=0; LPOSITION=*; MPOSITION=*
YAXIS        3; XORIGIN=0; LPOSITION=*; MPOSITION=*
TXCONSTRUCT  [TEXT=groupno] !(1...16)
PEN          1; SYMBOLS=0; LABELS=groupno
DGRAPH       [TITLE='Principal coordinate analysis'; WINDOW=3; KEY=0]\ 
             groupscore[2]; groupscore[1]; PEN=1
GETATTRIBUTE [ATTRIBUTE=labels] group; groupatt
TXCONSTRUCT  [TEXT=groupkey] !(1...16),' = ',groupatt['labels']
FOR
  CAPTION    'Key to points on the graph'; STYLE=minor 
  PRINT      [IPRINT=*] groupkey
ENDFOR
" Factorial model - note: this is on a different scale and gives a
  slightly different breakdown from Table 2 of Gower & Krzanowski,
  as their analysis was unweighted by group size.
  Only 99 permutations are made, to save computing time."
MVAOD        [PRINT=aod,permutation; TERMS=N*T*S*G; NTIMES=99; SEED=629856]\
             pbdistances
" For Gower & Krzanowski breakdown, use between-group distance matrix."
FACTOR       [NVALUES=16; LEVELS=2] Gb,Sb,Tb,Nb
GENERATE     Gb,Sb,Tb,Nb
MVAOD        [PRINT=aod,permutation; TERMS=Nb*Tb*Sb*Gb] groupdistances[1]
Updated on March 7, 2019

Was this article helpful?