MCORANALYSIS procedure

Does multiple correspondence analysis (A.I. Glaser).

Options

`PRINT` = string tokens	Printed output from the analysis (`roots`, `rowscores`, `rowinertias`, `rowchisquare`, `rowmass`, `rowquality`, `colscores`, `colinertias`, `colchisquare`, `colmass`, `colquality`); default `*` i.e. no output
`ROWMETHOD` = string token	Analysis method for rows i.e. units (`indicator`); default `indi`
`COLMETHOD` = string token	Analysis method for columns i.e. factors (`adjusted`, `burt`, `indicator`); default `adju`
`NROOTS` = scalar	Number of latent roots for printed output; default * requests them all to be printed
`%METHOD` = string token	How to represent proportions or %s in quality statistics (`permills`, `percentages`, `proportions`); default `prop`
`NDIMENSIONS` = scalar	Number of dimensions for which quality statistics are required; default 2
`TOLERANCE` = scalar	Tolerance criteria for zero eigenvalues; default 10^-6

Parameters

`DATA` = pointers	Data to be analysed
`ROOTS` = diagonal matrices	Saves the squared singular values from each analysis
`ROWSCORES` = matrices	Saves the scores for the rows of the data
`COLSCORES` = matrices	Saves the scores for the columns of the data
`ROWINERTIAS` = matrices	Saves the total inertias for the rows of the data
`COLINERTIAS` = matrices	Saves the total inertias for the columns of the data
`ROWQUALITY` = matrices	Saves the quality statistics for rows of the data
`COLQUALITY` = matrices	Saves the quality statistics for columns of the data
`SUBINERTIAS` = matrices	Saves the inertias of the subtables of the Burt matrices
`FREQUENCY` = variates	Frequencies for elements of `DATA`
`SAVE` = pointers	Saves details of the analysis for use by `CABIPLOT`

Description

Ordinary correspondence analysis is an ordination technique used to analyse relationships between two categorical variables (see procedure CORANALYSIS). Ordination techniques aim to represent the relationships approximately, in a reduced number of dimensions, to make them easier to study e.g. with graphs. Multiple correspondence analysis provides a similar analysis for more than two variables.

The data consist of a list of factors, which are supplied in a pointer by the DATA parameter. By default, each unit of the factors is assumed to represent a single observation. However, with large data sets, you may want to use the FREQUENCY parameter to supply a variate defining frequencies (or numbers of replications) for each unit. MCORANALYSIS uses the data to form an indicator matrix D, with a row for each unit and a columns for each level of every factor. Each row of the matrix has the value one in the columns corresponding to the levels of the factors that occurred in that data unit and zero elsewhere. (This is equivalent to the design matrix that is used in analysis of variance or regression.) The factors must not contain any missing values.

The relationships between the rows are assessed by doing an ordinary correspondence analysis on the indicator matrix. This analysis also provides information on the relationships between the columns (i.e. the factor levels). However, an alternative method for the columns does the correspondence analysis on the Burt matrix D′D. A refinement of the use of the Burt matrix discards eigenvalues below a threshold 1/Q, where Q is the number of DATA factors. This adjusts for the inflation of the eigenvalues that arises from the within-factor diagonal blocks of the Burt matrix; see Greenacre (2007) Chapter 19 for more details. The difference between the results obtained using the indicator and Burt matrices is that the singular values obtained from the Burt matrix will be the squares of those obtained from the indicator matrix. The adjusted method is the default method for the columns, but the other two methods can be requested by using the COLMETHOD option. With very large data sets it may be impractical to do the correspondence analysis on the indicator matrix for rows. So MCORANALYSIS allows this to be suppressed by setting option ROWMETHOD=*.

Printed output is controlled by the PRINT option with settings:

`roots`	to print the roots (together with the roots expressed as percentages and cumulative percentages),
`rowscores`	to print the scores for the rows of the indicator matrix,
`rowinertias`	to print the inertias for the rows of the indicator matrix,
`rowmass`	to print the row masses,
`rowchisquare`	to print the row chisquare distances,
`rowquality`	to print the quality statistics for the rows,
`colscores`	to print the scores for the columns of the indicator or Burt matrix (as selected by the `COLMETHOD` option),
`colinertias`	to print the inertias for the columns,
`colmass`	to print the column masses,
`colchisquare`	to print the column chisquare distances,
`colquality`	to print the quality statistics for the columns, and
`subinertias`	to print the inertias of the subtables of the Burt matrix.

The NROOTS option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots greater than the limit defined by the TOLERANCE option. However, you can set the NROOTS option to specify a lesser number.

The quality settings produce tables with the following columns:

● the mass of the row (or column), in proportion to the total mass;

● the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;

● the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);

● principal coordinates of the rows (or columns) in the specified dimension;

● the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;

● the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.

The representation of the columns of proportions is controlled by the %METHOD option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The NDIMENSIONS option specifies the number of dimensions for which to print quality statistics; default 2.

Results from the analysis can be saved using the parameters ROOTS, ROWSCORES, COLSCORES, ROWINERTIAS, COLINERTIAS, ROWQUALITY and COLQUALITY. The structures specified for these parameters need not be declared in advance. The SAVE parameter can save full details of the analysis for use by the CABIPLOT procedure.

Options: PRINT, ROWMETHOD, COLMETHOD, NROOTS, %METHOD, NDIMENSIONS, TOLERANCE.

Parameters: DATA, ROOTS, ROWSCORES, COLSCORES, ROWINERTIAS, COLINERTIAS, ROWQUALITY, COLQUALITY, SUBINERTIAS, FREQUENCY, SAVE.

Method

MCORANALYSIS first applies correspondence analysis to the indicator matrix. This is essentially the design matrix D for an analysis of variance or regression, fitting a model with just the main effects of the factors, and can be obtained from the TERMS directive as follows:

CALC nv = NVAL(DATA[1])

MODEL !(1...#nv)

TERMS [FULL=yes; DESIGN=D] DATA[]

DUPLICATE [REDEFINE=yes] D$[*; -1]; NEWSTRUCTURE=D

(The DUPLICATE statement removes the column for the constant, which is not required.) The Burt matrix D′D can be calculated by

CALCULATE Burt = T(D) *+ D

When METHOD=adjusted, all the eigenvalues (squared singular values) less than or equal to 1/Q are set to zero, where Q is the number of variables in the data. To take into account the inflated inertia, each non-zero eigenvalue λ is then multiplied by

( Q / (Q – 1) * (λ – 1/Q) )²

The percentages are calculated by dividing the adjusted eigenvalues by the sum of the pre-adjusted eigenvalues, so they may not always sum to 100%.

References

Greenacre, M.J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.

Greenacre, M. (2007). Correspondence Analysis in Practice, second edition. Chapman & Hall, London.

Example

CAPTION      'MCORANALYSIS example','Example on p.237 of Greenacre (2007)';\
             STYLE=meta,plain
" The data come from an International Social Survey Programme (ISSP)
  survey of Family and Changing Gender Roles in 1994 in 24 countries.
  The spreadsheet MCOR-1.gsh contains the opinions of German residents
  about working women for 4 questions, each with 4 possible responses."
SPLOAD       FILE='%gendir%/examples/MCOR-1.GSH'
POINTER      [VALUES=Q1,Q2,Q3,Q4] women
MCORANALYSIS [PRINT=roots,colscores,colquality; NROOTS=2] women
"Plot the column scores in the 1st and 2nd dimensions"
CABIPLOT     [PLOT=colscores]

Updated on March 7, 2019

Was this article helpful?

Yes No