Clusters rows and columns of a two-way interaction table (J.T.N.M. Thissen & J. de Bree).
|What information to print (
||Variance of a mean in
||Degrees of freedom of
||Specifies a value of cumSS at which to partition the dendrograms and to define factors
||General title for the high-resolution graph; default
||Pen size for y-labels of dendrograms; default 1|
||Two-way table whose interaction structure is to be clarified|
||To either save or specify amalgamations for rows|
||To either save or specify amalgamations for columns|
||To specify order of labels in the row dendrogram|
||To specify order of labels in the column dendrogram|
||To save the grouping of the rows specified by the
||To save the grouping of the columns specified by the
||To save the sorted
Consider an orthogonal table of uncorrelated, normally distributed means with common variance σ2. Let the table be classified by two unstructured, qualitative factors between which an interaction has been detected. Such a table may emerge as a table of means from
ANOVA, but this is not necessary.
CINTERACTION performs a grouping of rows and columns of the table to identify a hopefully minimum number of groups which account for the overall interaction, but which are internally homogeneous. Grouping is accomplished by means of agglomerative hierarchical clustering as described in Corsten & Denis (1990).
The procedure goes through a sequence of steps. In each step the mean square for interaction is calculated for all possible subtables consisting of a pair of rows or a pair of columns of the full table. The pair of rows or columns with minimal mean square is merged, giving an updated table, and the process is repeated. Thus a sequence of amalgamations of rows and columns is produced, eventually leading to a 2×2 table. In this way the total sum of squares for the interaction is built up from orthogonal increments, each connected with a merge as described above, and insight into a possible structure of the interaction may be obtained. As a stopping rule for the merging process, Corsten & Denis (1990) suggest a simultaneous test procedure, which provides a probability of stopping too early, i.e. of ending up with too many groups.
The data for the procedure is a table, specified by the
TABLE parameter. Missing values are not allowed. You can also provide an estimate of σ2, which is the common variance of the means in the table, together with its degrees of freedom by means of the options
Printed output is controlled by the
PRINT=sortedtable prints a sorted table with increasing row and column means.
PRINT=aovtable gives an analysis of variance which decomposes the total variation into that contributed by rows, columns and the interaction between rows and columns. If options
DF have been specified, setting
PRINT=variance prints the estimate of the variance σ2 together with its degrees of freedom. The effect of
PRINT=monitoring depends on whether
VARIANCE is specified or not. If
VARIANCE is specified,
PRINT=monitoring displays the sequence of merges starting just before the step in which the probability of stopping too early drops below the setting of the
PRMONITOR option, the default setting of which is 0.95. Setting
PRMONITOR=1 then displays the full sequence of merges.
PRMONITOR is ignored if
VARIANCE is not specified.
PRINT=summary produces a summary of the clustering process giving, for each step, its number (in the column entitled step), the corresponding reduction of degrees of freedom due to the merging of the subtable (df), the mean square for interaction of the subtable which is merged (ms), the cumulated degrees of freedom (cumdf), the interaction sum of squares explained (cumSS), and (if options
DF have been set) the P-value of stopping too early (P).
PRINT=amalgamations prints the amalgamation matrices. Finally, setting
PRINT=dendrogram produces two dendrograms in a high-resolution graph, one for rows above the horizontal axis and one for columns below. The
TITLE option can be used to supply a title for the dendrograms. By default all this information is printed.
COLAMALGAMATIONS parameters may be used for saving the amalgamation matrices of rows and columns respectively, and contain information for drawing the dendrograms (as from directive
HCLUSTER). The sorted table may be saved using the parameter
Saving the amalgamations matrices can be useful if you wish to modify the layout of the dendrogram after inspecting the results. To do this, you set the
TABLE parameter as before, and set
COLAMALGAMATIONS to the saved amalgamation matrices. Options
PENSIZE, and parameters
COLPERMUTATION, can then be used to control the layout of the dendrogram. By setting option
SSTHRESHOLD to a specific value for the abscissa (i.e. cumSS) the grouping of rows and columns corresponding to this value is represented by a vertical dotted line in the dendrograms; by default value no line is drawn. The line is drawn in the default style of pen 3. You may wish to define a particular line style for this pen, since the appearance of lines is device specific. The groupings at this point can be saved using the
COLGROUPS parameters. Parameters
COLPERMUTATION specify the order of the labels in the row and column dendrograms, starting from the horizontal axis.
In each step of the merging process the pair of rows or columns which contributes least to the total sum of squares for interaction is traced by repeated use of (weighted)
ANOVA. When the number of data in each cell, which may have increased by previous merges, is properly accounted for, each of the successive sum of squares is the squared projection of the (tabulated) data vector on a subspace of interactions, orthogonal to all preceding such subspaces. In this way cumSS increases with hopefully minimal speed, but this is not guaranteed because of the sequential character of the procedure.
The P-values as presented in the summary table in Corsten & Denis (1990), are incorrect in that they apparently have been obtained by using D instead of n degrees of freedom in the denominator of the F-statistic concerned.
Restrictions are not allowed.
Corsten, L.C.A. and Denis, J.B. (1990). Structuring interaction in two-way tables by clustering. Biometrics, 46, 207-215.
CAPTION 'CINTERACTION example',\ !t('Data from Corsten and Denis (1990, Biometrics, 46, 207-215);'\, 'also see Corsten (1996, Biometrical Letters, 33, 33-43) for',\ 'a definition of the correct column numbering.'); STYLE=meta,plain TEXT [VALUES=row1, row2, row3, row4, row5, row6, row7, row8, row9,\ row10, row11, row12, row13, row14, row15, row16, row17, row18,\ row19, row20] trow TEXT [VALUES=column1, column2, column3, column4, column5, column6,\ column7] tcol FACTOR [LABELS=trow ; VALUES=7(1...20)] row FACTOR [LABELS=tcol ; VALUES=(1...7)20] col TABLE [CLASSIFICATION=row, col] table READ table 59.8 61.0 75.6 58.8 64.4 62.7 53.4 64.5 70.3 81.5 60.4 73.2 78.3 70.8 59.5 68.2 79.9 60.0 72.3 76.9 71.5 65.1 71.9 82.1 58.5 71.9 83.2 71.5 64.2 68.2 81.2 60.1 74.2 85.4 78.4 56.4 65.2 79.6 58.0 68.0 80.6 73.0 63.5 65.6 74.3 60.7 71.6 73.2 69.4 58.3 65.7 72.7 60.0 74.6 73.7 66.1 61.9 66.1 83.4 62.9 74.5 75.6 74.4 58.9 64.8 80.9 67.7 71.5 72.0 70.0 57.2 64.1 81.2 56.2 75.4 72.4 65.7 58.0 66.1 85.5 62.2 74.5 82.0 70.0 62.0 71.8 69.1 62.8 71.4 77.0 75.6 51.6 62.5 53.7 55.7 59.6 71.8 67.3 62.9 64.8 67.2 61.2 64.5 77.9 65.5 60.2 63.2 73.1 60.1 74.9 86.0 71.7 55.4 63.3 66.9 58.3 73.8 76.9 65.0 53.7 68.1 71.5 64.1 76.7 90.2 72.5 54.5 67.3 76.7 60.2 82.2 81.0 73.1 56.1 59.4 76.3 62.5 70.4 85.9 65.4 : CINTERACTION [PRMONITOR=0.9; VARIANCE=8.59; DF=266] table