1. Home
  2. CORANALYSIS procedure

CORANALYSIS procedure

Does correspondence analysis, or reciprocal averaging (P.G.N. Digby & A.I. Glaser).

Options

PRINT = string tokens Printed output from the analysis (roots, rowscores, rowinertias, rowchisquare, rowmass, rowquality, colscores, colinertias, colchisquare, colmass, colquality); default * i.e. no output
METHOD = string token Type of analysis required (correspondence, digbycorrespondence, biplot, reciprocal); default corr
NROOTS = scalar Number of latent roots for printed output; default * requests them all to be printed
%METHOD = string token How to represent proportions or %s in quality statistics (permills, percentages, proportions); default prop
NDIMENSIONS = scalar Number of dimensions for which quality statistics are required; default 2
ROWSUBSET = scalars Indexes of subset rows
COLSUBSET` = scalars Indexes of subset columns
ROWPASSIVE = scalars Indexes of passive rows
COLPASSIVE = scalars Indexes of passive columns

Parameters

DATA = matrices or data matrices Data to be analysed
ROOTS = diagonal matrices Saves the squared singular values from each analysis
ROWSCORES = matrices Saves the scores for the rows of the data matrix
COLSCORES = matrices Saves the scores for the columns of the data matrix
ROWINERTIAS = matrices Saves the inertias for the rows of the data matrix
COLINERTIAS = matrices Saves the inertias for the columns of the data matrix
ROWQUALITY = matrices Saves the quality statistics for rows of the data
COLQUALITY = matrices Saves the quality statistics for columns of the data
SAVE = pointers Saves details of the analysis for use by CABIPLOT

Description

Correspondence analysis is an ordination technique used to analyse two-way categorical data tables. Ordination techniques approximate relationships between variables in a reduced number of dimensions.

The type of analysis is specified by the METHOD option, with one of the following settings:

    correspondence correspondence analysis (Greenacre 1984),
    digbycorrespondence an alternative implementation of correspondence analysis described by Digby & Kempton (1987),
    reciprocal reciprocal averaging (see Digby & Kempton 1987), or
    biplot a similar biplot-style analysis (again see Digby & Kempton 1987).

The default setting is correspondence, and this should be retained if either of the options to subset rows or columns are set.

The data for the procedure are specified by the DATA parameter as either a matrix or a datamatrix (i.e. a pointer to variates, all with the same length). The matrix must not contain any missing values; it is unchanged on exit from the procedure.

Printed output is controlled by the PRINT option with settings:

    roots to print the roots (together with the roots expressed as percentages and cumulative percentages),
    rowscores to print the scores for the rows of the data matrix,
    rowinertias to print the inertias for the rows of the data matrix,
    rowmass to print the row masses,
    rowchisquare to print the row chi-square distances,
    rowquality to print the quality statistics for the rows,
    colscores to print the scores for the columns of the data matrix,
    colinertias to print the inertias for the columns of the data matrix,
    colmass to print the column masses,
    colchisquare to print the column chi-square distances, and
    colquality to print the quality statistics for the columns.

The NROOTS option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots, but you can set the NROOTS option to specify a lesser number.

The quality settings produce tables with the following columns:

●  the mass of the row (or column), in proportion to the total mass;

●  the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;

●  the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);

●  principal coordinates of the rows (or columns) in the specified dimension;

●  the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;

●  the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.

The representation of the columns of proportions is controlled by the %METHOD option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The NDIMENSIONS option specifies the number of dimensions for which to print quality statistics; default 2.

When carrying out correspondence analysis, there may be rows and/or columns (for example outliers with low mass) that you would like to ignore during the calculation of the roots or inertia, so that they have no influence. Instead of removing these rows and/or columns from the data before running CORANALYSIS, an alternative is to list the indexes of the rows or columns that are to be ignored using the ROWPASSIVE and/or COLPASSIVE options. These “passive” rows will still be included in the table of quality statistics, where their relative contributions will be shown and compared to total for all the passive rows or columns.

You may want to apply a correspondence analysis calculated from the whole data set onto only a subset of the rows and/or columns when some of the rows and/or columns divide into groups with common traits. This can be done by setting the ROWSUBSET and/or COLSUBSET options to the indexes of the rows and/or columns indexes in the subset of interest. If any of these options is set, the METHOD option must be set to correspondence. If ROWPASSIVE and ROWSUBSET (or COLPASSIVE and COLSUBSET) are both set, any indexes that occur in both will be removed from the ROWSUBSET (or COLSUBSET).

Results from the analysis can be saved using the parameters ROOTS, ROWSCORES, COLSCORES, ROWINERTIAS, COLINERTIAS, ROWQUALITY and COLQUALITY. The structures specified for these parameters need not be declared in advance. The SAVE parameter can save full details of the analysis for use by the CABIPLOT procedure.

Options: PRINT, METHOD, NROOTS, %METHOD, NDIMENSIONS, ROWSUBSET, COLSUBSET, ROWPASSIVE, COLPASSIVE.

Parameters: DATA, ROOTS, ROWSCORES, COLSCORES, ROWINERTIAS, COLINERTIAS, ROWQUALITY, COLQUALITY, SAVE.

Method

Full details of correspondence analysis (i.e. METHOD=correspondence) are given by Greenacre (1984 & 2007). The other methods are described by Digby & Kempton (1987).

The data matrix X, is scaled to have sum one for METHOD settings correspondence and digbycorrespondence. The matrices U, S and V are taken from the singular-value decomposition of

Y = (XR C) / √(R C)

for METHOD=correspondence and

Y = ( R X C )

for the other methods, where R and C are diagonal matrices of row and column totals of the data matrix X. The scores for the rows and columns from METHOD=correspondence are

A = ( R U )

and

B = ( C V )

The scores from METHOD=digbycorrespondence are similar, but are multiplied by S. This makes the row scores obtained here the same as the principal coordinates given with the quality statistics.

With the other two methods X is not scaled to total one, and the scores are given by A = ( R U Sm ) and B = ( C V Sm ): the parameter m is zero for METHOD=reciprocal, and 0.5 for METHOD=biplot.

The inertia values for the rows and columns are given by

R A A′ ) S

and

C B B′ ) S

where S′ = S for METHOD=correspondence, and S = 1 for the other methods; see Greenacre (1984) for further information.

The roots are the squares of the singular values. Note that the first singular value will always be one for methods other than correspondence; this corresponds to a trivial solution given in the first column of A and B above, which is automatically removed from the results printed and saved from CORANALYSIS.

Rows and/or columns chosen as passive rows and/or columns are separated from the original data matrix before it is scaled. Rows and/or columns chosen as subset rows and/or columns are separated from Y after this scaling.

For the quality statistics, the weighted sum-of-squares of the principal coordinates on the ith dimension is equal to the ith squared singular value. The row and column scores for METHOD=digbycorrespondence are equivalent to the principal coordinates. Conversely the row and column scores for METHOD=correspondence or reciprocal are equivalent to standard coordinates, where the weighted sum-of-squares for each dimension is equal to one.

References

Digby, P.G.N. & Kempton, R.A. (1987). Multivariate Analysis of Ecological Communities. Chapman & Hall, London.

Greenacre, M.J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.

Greenacre, M. (2007). Correspondence Analysis in Practice, second edition. Chapman & Hall, London.

See also

Procedures: CABIPLOT, MCORANALYSIS.

Commands for: Multivariate and cluster analysis.

Example

CAPTION 'CORANALYSIS example',\ 
        'Data from Table 9.1 of Greenacre (2007)'; STYLE=meta,plain
TEXT    Staff,St; VALUES=!T(Sen_Mngr,Jun_Mngr,Sen_Empl,Jun_Empl,Secretry),\ 
                         !T(SM,      JM,      SE,      JE,      Sy)
&       Smoke; VALUES=!T(None,Light,Medium,Heavy)
MATRIX  [ROWS=Staff; COLUMNS=Smoke] Smoking; VALUES=\ 
        !( 4, 2, 3, 2, 4, 3, 7, 4, 25, 10, 12, 4, 18, 24, 33, 13, 10, 6, 7, 2)
PRINT   Smoking; FIELDWIDTH=8; DECIMALS=0
CAPTION 'Use CORANALYSIS, printing all results, saving SCORES only.'
CORANALYSIS [PRINT=roots,rowscores,colscores,rowinertia,colinertia;\ 
        METHOD=correspondence] Smoking; SAVE=cora1
"Print rowmass"
PRINT cora1['rowmass']
"Plot the scores in the 1st and 2nd dimensions. Row are in principal coordinates
 and columns are in standard coordinates. Figure 9.2 of Greenacre (2007)."
CABIPLOT [COLSCALING=standard] LROW=St
Updated on March 8, 2019

Was this article helpful?