Does multiple correspondence analysis (A.I. Glaser).
Options
PRINT = string tokens |
Printed output from the analysis (roots , rowscores , rowinertias , rowchisquare , rowmass , rowquality , colscores , colinertias , colchisquare , colmass , colquality ); default * i.e. no output |
---|---|
ROWMETHOD = string token |
Analysis method for rows i.e. units (indicator ); default indi |
COLMETHOD = string token |
Analysis method for columns i.e. factors (adjusted , burt , indicator ); default adju |
NROOTS = scalar |
Number of latent roots for printed output; default * requests them all to be printed |
%METHOD = string token |
How to represent proportions or %s in quality statistics (permills , percentages , proportions ); default prop |
NDIMENSIONS = scalar |
Number of dimensions for which quality statistics are required; default 2 |
TOLERANCE = scalar |
Tolerance criteria for zero eigenvalues; default 10-6 |
Parameters
DATA = pointers |
Data to be analysed |
---|---|
ROOTS = diagonal matrices |
Saves the squared singular values from each analysis |
ROWSCORES = matrices |
Saves the scores for the rows of the data |
COLSCORES = matrices |
Saves the scores for the columns of the data |
ROWINERTIAS = matrices |
Saves the total inertias for the rows of the data |
COLINERTIAS = matrices |
Saves the total inertias for the columns of the data |
ROWQUALITY = matrices |
Saves the quality statistics for rows of the data |
COLQUALITY = matrices |
Saves the quality statistics for columns of the data |
SUBINERTIAS = matrices |
Saves the inertias of the subtables of the Burt matrices |
FREQUENCY = variates |
Frequencies for elements of DATA |
SAVE = pointers |
Saves details of the analysis for use by CABIPLOT |
Description
Ordinary correspondence analysis is an ordination technique used to analyse relationships between two categorical variables (see procedure CORANALYSIS
). Ordination techniques aim to represent the relationships approximately, in a reduced number of dimensions, to make them easier to study e.g. with graphs. Multiple correspondence analysis provides a similar analysis for more than two variables.
The data consist of a list of factors, which are supplied in a pointer by the DATA
parameter. By default, each unit of the factors is assumed to represent a single observation. However, with large data sets, you may want to use the FREQUENCY
parameter to supply a variate defining frequencies (or numbers of replications) for each unit. MCORANALYSIS
uses the data to form an indicator matrix D, with a row for each unit and a columns for each level of every factor. Each row of the matrix has the value one in the columns corresponding to the levels of the factors that occurred in that data unit and zero elsewhere. (This is equivalent to the design matrix that is used in analysis of variance or regression.) The factors must not contain any missing values.
The relationships between the rows are assessed by doing an ordinary correspondence analysis on the indicator matrix. This analysis also provides information on the relationships between the columns (i.e. the factor levels). However, an alternative method for the columns does the correspondence analysis on the Burt matrix D′D. A refinement of the use of the Burt matrix discards eigenvalues below a threshold 1/Q, where Q is the number of DATA
factors. This adjusts for the inflation of the eigenvalues that arises from the within-factor diagonal blocks of the Burt matrix; see Greenacre (2007) Chapter 19 for more details. The difference between the results obtained using the indicator and Burt matrices is that the singular values obtained from the Burt matrix will be the squares of those obtained from the indicator matrix. The adjusted method is the default method for the columns, but the other two methods can be requested by using the COLMETHOD
option. With very large data sets it may be impractical to do the correspondence analysis on the indicator matrix for rows. So MCORANALYSIS
allows this to be suppressed by setting option ROWMETHOD=*
.
Printed output is controlled by the PRINT
option with settings:
roots |
to print the roots (together with the roots expressed as percentages and cumulative percentages), |
---|---|
rowscores |
to print the scores for the rows of the indicator matrix, |
rowinertias |
to print the inertias for the rows of the indicator matrix, |
rowmass |
to print the row masses, |
rowchisquare |
to print the row chisquare distances, |
rowquality |
to print the quality statistics for the rows, |
colscores |
to print the scores for the columns of the indicator or Burt matrix (as selected by the COLMETHOD option), |
colinertias |
to print the inertias for the columns, |
colmass |
to print the column masses, |
colchisquare |
to print the column chisquare distances, |
colquality |
to print the quality statistics for the columns, and |
subinertias |
to print the inertias of the subtables of the Burt matrix. |
The NROOTS
option controls the printed output of roots, scores and inertias. By default, results are printed for all the roots greater than the limit defined by the TOLERANCE
option. However, you can set the NROOTS
option to specify a lesser number.
The quality settings produce tables with the following columns:
● the mass of the row (or column), in proportion to the total mass;
● the “quality” of the representation i.e. how much of the inertia of a row (or column) is represented by the dimensions shown;
● the proportion of the total inertia of the row (or column) compared to the total inertia for all rows (or columns);
● principal coordinates of the rows (or columns) in the specified dimension;
● the amount of inertia for each row (or column) in the specified dimension relative to the total amount of inertia given by the value of the quality statistic – hence the sum of a specific row (or column) across the dimensions shown will be equal to the value given by the quality statistic;
● the proportion of inertia explained by a row (or column) in a dimension, compared to the total inertia in that dimension.
The representation of the columns of proportions is controlled by the %METHOD
option; these can be printed either as proportions (default), percentages or as permills i.e. tenths of a percent. The NDIMENSIONS
option specifies the number of dimensions for which to print quality statistics; default 2.
Results from the analysis can be saved using the parameters ROOTS
, ROWSCORES
, COLSCORES
, ROWINERTIAS
, COLINERTIAS
, ROWQUALITY
and COLQUALITY
. The structures specified for these parameters need not be declared in advance. The SAVE
parameter can save full details of the analysis for use by the CABIPLOT
procedure.
Options: PRINT
, ROWMETHOD
, COLMETHOD
, NROOTS
, %METHOD
, NDIMENSIONS
, TOLERANCE
.
Parameters: DATA
, ROOTS
, ROWSCORES
, COLSCORES
, ROWINERTIAS
, COLINERTIAS
, ROWQUALITY
, COLQUALITY
, SUBINERTIAS
, FREQUENCY
, SAVE
.
Method
MCORANALYSIS
first applies correspondence analysis to the indicator matrix. This is essentially the design matrix D
for an analysis of variance or regression, fitting a model with just the main effects of the factors, and can be obtained from the TERMS
directive as follows:
CALC nv = NVAL(DATA[1])
MODEL !(1...#nv)
TERMS [FULL=yes; DESIGN=D] DATA[]
DUPLICATE [REDEFINE=yes] D$[*; -1]; NEWSTRUCTURE=D
(The DUPLICATE
statement removes the column for the constant, which is not required.) The Burt matrix D
′D
can be calculated by
CALCULATE Burt = T(D) *+ D
When METHOD=adjusted
, all the eigenvalues (squared singular values) less than or equal to 1/Q are set to zero, where Q is the number of variables in the data. To take into account the inflated inertia, each non-zero eigenvalue λ is then multiplied by
( Q / (Q – 1) * (λ – 1/Q) )2
The percentages are calculated by dividing the adjusted eigenvalues by the sum of the pre-adjusted eigenvalues, so they may not always sum to 100%.
References
Greenacre, M.J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
Greenacre, M. (2007). Correspondence Analysis in Practice, second edition. Chapman & Hall, London.
See also
Procedures: CABIPLOT
, CORANALYSIS
.
Commands for: Multivariate and cluster analysis.
Example
CAPTION 'MCORANALYSIS example','Example on p.237 of Greenacre (2007)';\ STYLE=meta,plain " The data come from an International Social Survey Programme (ISSP) survey of Family and Changing Gender Roles in 1994 in 24 countries. The spreadsheet MCOR-1.gsh contains the opinions of German residents about working women for 4 questions, each with 4 possible responses." SPLOAD FILE='%gendir%/examples/MCOR-1.GSH' POINTER [VALUES=Q1,Q2,Q3,Q4] women MCORANALYSIS [PRINT=roots,colscores,colquality; NROOTS=2] women "Plot the column scores in the 1st and 2nd dimensions" CABIPLOT [PLOT=colscores]