Does correspondence analysis, or reciprocal averaging (P.G.N. Digby & A.I. Glaser).
Options
PRINT = string tokens |
Printed output from the analysis (roots , rowscores , rowinertias , rowchisquare , rowmass , rowquality , colscores , colinertias , colchisquare , colmass , colquality ); default * i.e. no output |
---|---|
METHOD = string token |
Type of analysis required (correspondence , digbycorrespondence , biplot , reciprocal ); default corr |
NROOTS = scalar |
Number of latent roots for printed output; default * requests them all to be printed |
%METHOD = string token |
How to represent proportions or %s in quality statistics (permills , percentages , proportions ); default prop |
NDIMENSIONS = scalar |
Number of dimensions for which quality statistics are required; default 2 |
ROWSUBSET = scalars |
Indexes of subset rows |
COLSUBSET ` = scalars |
Indexes of subset columns |
ROWPASSIVE = scalars |
Indexes of passive rows |
COLPASSIVE = scalars |
Indexes of passive columns |
Parameters
DATA = matrices or data matrices |
Data to be analysed |
---|---|
ROOTS = diagonal matrices |
Saves the squared singular values from each analysis |
ROWSCORES = matrices |
Saves the scores for the rows of the data matrix |
COLSCORES = matrices |
Saves the scores for the columns of the data matrix |
ROWINERTIAS = matrices |
Saves the inertias for the rows of the data matrix |
COLINERTIAS = matrices |
Saves the inertias for the columns of the data matrix |
ROWQUALITY = matrices |
Saves the quality statistics for rows of the data |
COLQUALITY = matrices |
Saves the quality statistics for columns of the data |
SAVE = pointers |
Saves details of the analysis for use by CABIPLOT |
Description
Correspondence analysis is an ordination technique used to analyse two-way categorical data tables. Ordination techniques approximate relationships between variables in a reduced number of dimensions.
The type of analysis is specified by the METHOD
option, with one of the following settings:
correspondence |
correspondence analysis (Greenacre 1984), |
---|---|
digbycorrespondence |
an alternative implementation of correspondence analysis described by Digby & Kempton (1987), |
reciprocal |
reciprocal averaging (see Digby & Kempton 1987), or |
biplot |
a similar biplot-style analysis (again see Digby & Kempton 1987). |
The default setting is correspondence
, and this should be retained if either of the options to subset rows or columns are set.
The data for the procedure are specified by the DATA
parameter as either a matrix or a datamatrix (i.e. a pointer to variates, all with the same length). The matrix must not contain any missing values; it is unchanged on exit from the procedure.
Printed output is controlled by the PRINT
option with settings:
roots |
to print the roots (together with the roots expressed as percentages and cumulative percentages), |
---|---|
rowscores |
to print the scores for the rows of the data matrix, |
rowinertias |
to print the inertias for the rows of the data matrix, |
rowmass |
to print the row masses, |
rowchisquare |
to print the row chi-square distances, |
rowquality |
to print the quality statistics for the rows, |
colscores |
to print the scores for the columns of the data matrix, |
colinertias |
to print the inertias for the columns of the data matrix, |
colmass |
to print the column masses, |
colchisquare |
to print the column chi-square distances, and |
colquality |
to print the quality statistics for the columns. |
The NROOTS
option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots, but you can set the NROOTS
option to specify a lesser number.
The quality settings produce tables with the following columns:
● the mass of the row (or column), in proportion to the total mass;
● the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;
● the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);
● principal coordinates of the rows (or columns) in the specified dimension;
● the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;
● the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.
The representation of the columns of proportions is controlled by the %METHOD
option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The NDIMENSIONS
option specifies the number of dimensions for which to print quality statistics; default 2.
When carrying out correspondence analysis, there may be rows and/or columns (for example outliers with low mass) that you would like to ignore during the calculation of the roots or inertia, so that they have no influence. Instead of removing these rows and/or columns from the data before running CORANALYSIS
, an alternative is to list the indexes of the rows or columns that are to be ignored using the ROWPASSIVE
and/or COLPASSIVE
options. These “passive” rows will still be included in the table of quality statistics, where their relative contributions will be shown and compared to total for all the passive rows or columns.
You may want to apply a correspondence analysis calculated from the whole data set onto only a subset of the rows and/or columns when some of the rows and/or columns divide into groups with common traits. This can be done by setting the ROWSUBSET
and/or COLSUBSET
options to the indexes of the rows and/or columns indexes in the subset of interest. If any of these options is set, the METHOD
option must be set to correspondence
. If ROWPASSIVE
and ROWSUBSET
(or COLPASSIVE
and COLSUBSET
) are both set, any indexes that occur in both will be removed from the ROWSUBSET
(or COLSUBSET
).
Results from the analysis can be saved using the parameters ROOTS
, ROWSCORES
, COLSCORES
, ROWINERTIAS
, COLINERTIAS
, ROWQUALITY
and COLQUALITY
. The structures specified for these parameters need not be declared in advance. The SAVE
parameter can save full details of the analysis for use by the CABIPLOT
procedure.
Options: PRINT
, METHOD
, NROOTS
, %METHOD
, NDIMENSIONS
, ROWSUBSET
, COLSUBSET
, ROWPASSIVE
, COLPASSIVE
.
Parameters: DATA
, ROOTS
, ROWSCORES
, COLSCORES
, ROWINERTIAS
, COLINERTIAS
, ROWQUALITY
, COLQUALITY
, SAVE
.
Method
Full details of correspondence analysis (i.e. METHOD=correspondence
) are given by Greenacre (1984 & 2007). The other methods are described by Digby & Kempton (1987).
The data matrix X, is scaled to have sum one for METHOD
settings correspondence
and digbycorrespondence
. The matrices U, S and V are taken from the singular-value decomposition of
Y = (X – R C) / √(R C)
for METHOD=correspondence
and
Y = ( R-½ X C-½ )
for the other methods, where R and C are diagonal matrices of row and column totals of the data matrix X. The scores for the rows and columns from METHOD=correspondence
are
A = ( R-½ U )
and
B = ( C-½ V )
The scores from METHOD=digbycorrespondence
are similar, but are multiplied by S. This makes the row scores obtained here the same as the principal coordinates given with the quality statistics.
With the other two methods X is not scaled to total one, and the scores are given by A = ( R-½ U Sm ) and B = ( C-½ V Sm ): the parameter m is zero for METHOD=reciprocal
, and 0.5 for METHOD=biplot
.
The inertia values for the rows and columns are given by
( R A A′ ) S′
and
( C B B′ ) S′
where S′ = S for METHOD=correspondence
, and S = 1 for the other methods; see Greenacre (1984) for further information.
The roots are the squares of the singular values. Note that the first singular value will always be one for methods other than correspondence
; this corresponds to a trivial solution given in the first column of A and B above, which is automatically removed from the results printed and saved from CORANALYSIS
.
Rows and/or columns chosen as passive rows and/or columns are separated from the original data matrix before it is scaled. Rows and/or columns chosen as subset rows and/or columns are separated from Y after this scaling.
For the quality statistics, the weighted sum-of-squares of the principal coordinates on the ith dimension is equal to the ith squared singular value. The row and column scores for METHOD=digbycorrespondence
are equivalent to the principal coordinates. Conversely the row and column scores for METHOD=correspondence
or reciprocal
are equivalent to standard coordinates, where the weighted sum-of-squares for each dimension is equal to one.
References
Digby, P.G.N. & Kempton, R.A. (1987). Multivariate Analysis of Ecological Communities. Chapman & Hall, London.
Greenacre, M.J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
Greenacre, M. (2007). Correspondence Analysis in Practice, second edition. Chapman & Hall, London.
See also
Procedures: CABIPLOT
, MCORANALYSIS
.
Commands for: Multivariate and cluster analysis.
Example
CAPTION 'CORANALYSIS example',\ 'Data from Table 9.1 of Greenacre (2007)'; STYLE=meta,plain TEXT Staff,St; VALUES=!T(Sen_Mngr,Jun_Mngr,Sen_Empl,Jun_Empl,Secretry),\ !T(SM, JM, SE, JE, Sy) & Smoke; VALUES=!T(None,Light,Medium,Heavy) MATRIX [ROWS=Staff; COLUMNS=Smoke] Smoking; VALUES=\ !( 4, 2, 3, 2, 4, 3, 7, 4, 25, 10, 12, 4, 18, 24, 33, 13, 10, 6, 7, 2) PRINT Smoking; FIELDWIDTH=8; DECIMALS=0 CAPTION 'Use CORANALYSIS, printing all results, saving SCORES only.' CORANALYSIS [PRINT=roots,rowscores,colscores,rowinertia,colinertia;\ METHOD=correspondence] Smoking; SAVE=cora1 "Print rowmass" PRINT cora1['rowmass'] "Plot the scores in the 1st and 2nd dimensions. Row are in principal coordinates and columns are in standard coordinates. Figure 9.2 of Greenacre (2007)." CABIPLOT [COLSCALING=standard] LROW=St