Does multiple correspondence analysis (A.I. Glaser).
|Printed output from the analysis (
||Analysis method for rows i.e. units (
||Analysis method for columns i.e. factors (
||Number of latent roots for printed output; default * requests them all to be printed|
||How to represent proportions or %s in quality statistics (
||Number of dimensions for which quality statistics are required; default 2|
||Tolerance criteria for zero eigenvalues; default 10-6|
||Data to be analysed|
||Saves the squared singular values from each analysis|
||Saves the scores for the rows of the data|
||Saves the scores for the columns of the data|
||Saves the total inertias for the rows of the data|
||Saves the total inertias for the columns of the data|
||Saves the quality statistics for rows of the data|
||Saves the quality statistics for columns of the data|
||Saves the inertias of the subtables of the Burt matrices|
||Frequencies for elements of
||Saves details of the analysis for use by
Ordinary correspondence analysis is an ordination technique used to analyse relationships between two categorical variables (see procedure
CORANALYSIS). Ordination techniques aim to represent the relationships approximately, in a reduced number of dimensions, to make them easier to study e.g. with graphs. Multiple correspondence analysis provides a similar analysis for more than two variables.
The data consist of a list of factors, which are supplied in a pointer by the
DATA parameter. By default, each unit of the factors is assumed to represent a single observation. However, with large data sets, you may want to use the
FREQUENCY parameter to supply a variate defining frequencies (or numbers of replications) for each unit.
MCORANALYSIS uses the data to form an indicator matrix D, with a row for each unit and a columns for each level of every factor. Each row of the matrix has the value one in the columns corresponding to the levels of the factors that occurred in that data unit and zero elsewhere. (This is equivalent to the design matrix that is used in analysis of variance or regression.) The factors must not contain any missing values.
The relationships between the rows are assessed by doing an ordinary correspondence analysis on the indicator matrix. This analysis also provides information on the relationships between the columns (i.e. the factor levels). However, an alternative method for the columns does the correspondence analysis on the Burt matrix D′D. A refinement of the use of the Burt matrix discards eigenvalues below a threshold 1/Q, where Q is the number of
DATA factors. This adjusts for the inflation of the eigenvalues that arises from the within-factor diagonal blocks of the Burt matrix; see Greenacre (2007) Chapter 19 for more details. The difference between the results obtained using the indicator and Burt matrices is that the singular values obtained from the Burt matrix will be the squares of those obtained from the indicator matrix. The adjusted method is the default method for the columns, but the other two methods can be requested by using the
COLMETHOD option. With very large data sets it may be impractical to do the correspondence analysis on the indicator matrix for rows. So
MCORANALYSIS allows this to be suppressed by setting option
Printed output is controlled by the
||to print the roots (together with the roots expressed as percentages and cumulative percentages),|
||to print the scores for the rows of the indicator matrix,|
||to print the inertias for the rows of the indicator matrix,|
||to print the row masses,|
||to print the row chisquare distances,|
||to print the quality statistics for the rows,|
||to print the scores for the columns of the indicator or Burt matrix (as selected by the
||to print the inertias for the columns,|
||to print the column masses,|
||to print the column chisquare distances,|
||to print the quality statistics for the columns, and|
||to print the inertias of the subtables of the Burt matrix.|
NROOTS option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots greater than the limit defined by the
TOLERANCE option. However, you can set the
NROOTS option to specify a lesser number.
The quality settings produce tables with the following columns:
● the mass of the row (or column), in proportion to the total mass;
● the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;
● the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);
● principal coordinates of the rows (or columns) in the specified dimension;
● the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;
● the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.
The representation of the columns of proportions is controlled by the
%METHOD option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The
NDIMENSIONS option specifies the number of dimensions for which to print quality statistics; default 2.
Results from the analysis can be saved using the parameters
COLQUALITY. The structures specified for these parameters need not be declared in advance. The
SAVE parameter can save full details of the analysis for use by the
MCORANALYSIS first applies correspondence analysis to the indicator matrix. This is essentially the design matrix
D for an analysis of variance or regression, fitting a model with just the main effects of the factors, and can be obtained from the
TERMS directive as follows:
CALC nv = NVAL(DATA)
TERMS [FULL=yes; DESIGN=D] DATA
DUPLICATE [REDEFINE=yes] D$[*; -1]; NEWSTRUCTURE=D
DUPLICATE statement removes the column for the constant, which is not required.) The Burt matrix
D can be calculated by
CALCULATE Burt = T(D) *+ D
METHOD=adjusted, all the eigenvalues (squared singular values) less than or equal to 1/Q are set to zero, where Q is the number of variables in the data. To take into account the inflated inertia, each non-zero eigenvalue λ is then multiplied by
( Q / (Q – 1) * (λ – 1/Q) )2
The percentages are calculated by dividing the adjusted eigenvalues by the sum of the pre-adjusted eigenvalues, so they may not always sum to 100%.
Greenacre, M.J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
Greenacre, M. (2007). Correspondence Analysis in Practice, second edition. Chapman & Hall, London.
Commands for: Multivariate and cluster analysis.
CAPTION 'MCORANALYSIS example','Example on p.237 of Greenacre (2007)';\ STYLE=meta,plain " The data come from an International Social Survey Programme (ISSP) survey of Family and Changing Gender Roles in 1994 in 24 countries. The spreadsheet MCOR-1.gsh contains the opinions of German residents about working women for 4 questions, each with 4 possible responses." SPLOAD FILE='%gendir%/examples/MCOR-1.GSH' POINTER [VALUES=Q1,Q2,Q3,Q4] women MCORANALYSIS [PRINT=roots,colscores,colquality; NROOTS=2] women "Plot the column scores in the 1st and 2nd dimensions" CABIPLOT [PLOT=colscores]