Clusters rows and columns of a two-way interaction table (J.T.N.M. Thissen & J. de Bree).
Options
PRINT = string tokens |
What information to print (sortedtable , aovtable , summary , monitoring , variance , amalgamations , dendrogram ); default sort , aov , summ , moni , vari , amal , dend |
---|---|
PRMONITOR = scalar |
If option VARIANCE is set this provides a P-value to indicate when to start monitoring, if VARIANCE is unset PRMONITOR is ignored; default 0.95 |
VARIANCE = scalar |
Variance of a mean in TABLE ; default * |
DF = scalar |
Degrees of freedom of VARIANCE ; default * |
SSTHRESHOLD = scalar |
Specifies a value of cumSS at which to partition the dendrograms and to define factors ROWGROUPS and COLGROUPS ; default 0 i.e. no partitioning |
TITLE = text |
General title for the high-resolution graph; default * |
PENSIZE = scalar |
Pen size for y-labels of dendrograms; default 1 |
Parameters
TABLE = tables |
Two-way table whose interaction structure is to be clarified |
---|---|
ROWAMALGAMATIONS = matrices |
To either save or specify amalgamations for rows |
COLAMALGAMATIONS = matrices |
To either save or specify amalgamations for columns |
ROWPERMUTATIONS = variates |
To specify order of labels in the row dendrogram |
COLPERMUTATIONS = variates |
To specify order of labels in the column dendrogram |
ROWGROUPS = factors |
To save the grouping of the rows specified by the SSTHRESHOLD option |
COLGROUPS = factors |
To save the grouping of the columns specified by the SSTHRESHOLD option |
SORTEDTABLE = tables |
To save the sorted TABLE with increasing row and column means |
Description
Consider an orthogonal table of uncorrelated, normally distributed means with common variance σ2. Let the table be classified by two unstructured, qualitative factors between which an interaction has been detected. Such a table may emerge as a table of means from ANOVA
, but this is not necessary. CINTERACTION
performs a grouping of rows and columns of the table to identify a hopefully minimum number of groups which account for the overall interaction, but which are internally homogeneous. Grouping is accomplished by means of agglomerative hierarchical clustering as described in Corsten & Denis (1990).
The procedure goes through a sequence of steps. In each step the mean square for interaction is calculated for all possible subtables consisting of a pair of rows or a pair of columns of the full table. The pair of rows or columns with minimal mean square is merged, giving an updated table, and the process is repeated. Thus a sequence of amalgamations of rows and columns is produced, eventually leading to a 2×2 table. In this way the total sum of squares for the interaction is built up from orthogonal increments, each connected with a merge as described above, and insight into a possible structure of the interaction may be obtained. As a stopping rule for the merging process, Corsten & Denis (1990) suggest a simultaneous test procedure, which provides a probability of stopping too early, i.e. of ending up with too many groups.
The data for the procedure is a table, specified by the TABLE
parameter. Missing values are not allowed. You can also provide an estimate of σ2, which is the common variance of the means in the table, together with its degrees of freedom by means of the options VARIANCE
and DF
.
Printed output is controlled by the PRINT
option as follows. Setting PRINT=sortedtable
prints a sorted table with increasing row and column means. PRINT=aovtable
gives an analysis of variance which decomposes the total variation into that contributed by rows, columns and the interaction between rows and columns. If options VARIANCE
and DF
have been specified, setting PRINT=variance
prints the estimate of the variance σ2 together with its degrees of freedom. The effect of PRINT=monitoring
depends on whether VARIANCE
is specified or not. If VARIANCE
is specified, PRINT=monitoring
displays the sequence of merges starting just before the step in which the probability of stopping too early drops below the setting of the PRMONITOR
option, the default setting of which is 0.95. Setting PRMONITOR=1
then displays the full sequence of merges. PRMONITOR
is ignored if VARIANCE
is not specified. PRINT=summary
produces a summary of the clustering process giving, for each step, its number (in the column entitled step), the corresponding reduction of degrees of freedom due to the merging of the subtable (df), the mean square for interaction of the subtable which is merged (ms), the cumulated degrees of freedom (cumdf), the interaction sum of squares explained (cumSS), and (if options VARIANCE
and DF
have been set) the P-value of stopping too early (P). PRINT=amalgamations
prints the amalgamation matrices. Finally, setting PRINT=dendrogram
produces two dendrograms in a high-resolution graph, one for rows above the horizontal axis and one for columns below. The TITLE
option can be used to supply a title for the dendrograms. By default all this information is printed.
The ROWAMALGAMATIONS
and COLAMALGAMATIONS
parameters may be used for saving the amalgamation matrices of rows and columns respectively, and contain information for drawing the dendrograms (as from directive HCLUSTER
). The sorted table may be saved using the parameter SORTEDTABLE
.
Saving the amalgamations matrices can be useful if you wish to modify the layout of the dendrogram after inspecting the results. To do this, you set the TABLE
parameter as before, and set ROWAMALGAMATIONS
and COLAMALGAMATIONS
to the saved amalgamation matrices. Options SSTHRESHOLD
and PENSIZE
, and parameters ROWPERMUTATION
and COLPERMUTATION
, can then be used to control the layout of the dendrogram. By setting option SSTHRESHOLD
to a specific value for the abscissa (i.e. cumSS) the grouping of rows and columns corresponding to this value is represented by a vertical dotted line in the dendrograms; by default value no line is drawn. The line is drawn in the default style of pen 3. You may wish to define a particular line style for this pen, since the appearance of lines is device specific. The groupings at this point can be saved using the ROWGROUPS
and COLGROUPS
parameters. Parameters ROWPERMUTATION
and COLPERMUTATION
specify the order of the labels in the row and column dendrograms, starting from the horizontal axis.
Options: PRINT
, PRMONITOR
, VARIANCE
, DF
, SSTHRESHOLD
, TITLE
, PENSIZE
.
Parameters: TABLE
, ROWAMALGAMATIONS
, COLAMALGAMATIONS
, ROWPERMUTATIONS
, COLPERMUTATIONS
, ROWGROUPS
, COLGROUPS
, SORTEDTABLE
.
Method
In each step of the merging process the pair of rows or columns which contributes least to the total sum of squares for interaction is traced by repeated use of (weighted) ANOVA
. When the number of data in each cell, which may have increased by previous merges, is properly accounted for, each of the successive sum of squares is the squared projection of the (tabulated) data vector on a subspace of interactions, orthogonal to all preceding such subspaces. In this way cumSS increases with hopefully minimal speed, but this is not guaranteed because of the sequential character of the procedure.
The P-values as presented in the summary table in Corsten & Denis (1990), are incorrect in that they apparently have been obtained by using D instead of n degrees of freedom in the denominator of the F-statistic concerned.
Action with RESTRICT
Restrictions are not allowed.
Reference
Corsten, L.C.A. and Denis, J.B. (1990). Structuring interaction in two-way tables by clustering. Biometrics, 46, 207-215.
See also
Directives: ANOVA
, CLUSTER
, HCLUSTER
.
Commands for: Analysis of variance, Multivariate and cluster analysis.
Example
CAPTION 'CINTERACTION example',\ !t('Data from Corsten and Denis (1990, Biometrics, 46, 207-215);'\, 'also see Corsten (1996, Biometrical Letters, 33, 33-43) for',\ 'a definition of the correct column numbering.'); STYLE=meta,plain TEXT [VALUES=row1, row2, row3, row4, row5, row6, row7, row8, row9,\ row10, row11, row12, row13, row14, row15, row16, row17, row18,\ row19, row20] trow TEXT [VALUES=column1, column2, column3, column4, column5, column6,\ column7] tcol FACTOR [LABELS=trow ; VALUES=7(1...20)] row FACTOR [LABELS=tcol ; VALUES=(1...7)20] col TABLE [CLASSIFICATION=row, col] table READ table 59.8 61.0 75.6 58.8 64.4 62.7 53.4 64.5 70.3 81.5 60.4 73.2 78.3 70.8 59.5 68.2 79.9 60.0 72.3 76.9 71.5 65.1 71.9 82.1 58.5 71.9 83.2 71.5 64.2 68.2 81.2 60.1 74.2 85.4 78.4 56.4 65.2 79.6 58.0 68.0 80.6 73.0 63.5 65.6 74.3 60.7 71.6 73.2 69.4 58.3 65.7 72.7 60.0 74.6 73.7 66.1 61.9 66.1 83.4 62.9 74.5 75.6 74.4 58.9 64.8 80.9 67.7 71.5 72.0 70.0 57.2 64.1 81.2 56.2 75.4 72.4 65.7 58.0 66.1 85.5 62.2 74.5 82.0 70.0 62.0 71.8 69.1 62.8 71.4 77.0 75.6 51.6 62.5 53.7 55.7 59.6 71.8 67.3 62.9 64.8 67.2 61.2 64.5 77.9 65.5 60.2 63.2 73.1 60.1 74.9 86.0 71.7 55.4 63.3 66.9 58.3 73.8 76.9 65.0 53.7 68.1 71.5 64.1 76.7 90.2 72.5 54.5 67.3 76.7 60.2 82.2 81.0 73.1 56.1 59.4 76.3 62.5 70.4 85.9 65.4 : CINTERACTION [PRMONITOR=0.9; VARIANCE=8.59; DF=266] table