CINTERACTION procedure

Clusters rows and columns of a two-way interaction table (J.T.N.M. Thissen & J. de Bree).

Options

`PRINT` = string tokens	What information to print (`sortedtable`, `aovtable`, `summary`, `monitoring`, `variance`, `amalgamations`, `dendrogram`); default `sort`, `aov`, `summ`, `moni`, `vari`, `amal`, `dend`
`PRMONITOR` = scalar	If option `VARIANCE` is set this provides a P-value to indicate when to start monitoring, if `VARIANCE` is unset `PRMONITOR` is ignored; default 0.95
`VARIANCE` = scalar	Variance of a mean in `TABLE`; default `*`
`DF` = scalar	Degrees of freedom of `VARIANCE`; default `*`
`SSTHRESHOLD` = scalar	Specifies a value of cumSS at which to partition the dendrograms and to define factors `ROWGROUPS` and `COLGROUPS`; default 0 i.e. no partitioning
`TITLE` = text	General title for the high-resolution graph; default `*`
`PENSIZE` = scalar	Pen size for y-labels of dendrograms; default 1

Parameters

`TABLE` = tables	Two-way table whose interaction structure is to be clarified
`ROWAMALGAMATIONS` = matrices	To either save or specify amalgamations for rows
`COLAMALGAMATIONS` = matrices	To either save or specify amalgamations for columns
`ROWPERMUTATIONS` = variates	To specify order of labels in the row dendrogram
`COLPERMUTATIONS` = variates	To specify order of labels in the column dendrogram
`ROWGROUPS` = factors	To save the grouping of the rows specified by the `SSTHRESHOLD` option
`COLGROUPS` = factors	To save the grouping of the columns specified by the `SSTHRESHOLD` option
`SORTEDTABLE` = tables	To save the sorted `TABLE` with increasing row and column means

Description

Consider an orthogonal table of uncorrelated, normally distributed means with common variance σ². Let the table be classified by two unstructured, qualitative factors between which an interaction has been detected. Such a table may emerge as a table of means from ANOVA, but this is not necessary. CINTERACTION performs a grouping of rows and columns of the table to identify a hopefully minimum number of groups which account for the overall interaction, but which are internally homogeneous. Grouping is accomplished by means of agglomerative hierarchical clustering as described in Corsten & Denis (1990).

The procedure goes through a sequence of steps. In each step the mean square for interaction is calculated for all possible subtables consisting of a pair of rows or a pair of columns of the full table. The pair of rows or columns with minimal mean square is merged, giving an updated table, and the process is repeated. Thus a sequence of amalgamations of rows and columns is produced, eventually leading to a 2×2 table. In this way the total sum of squares for the interaction is built up from orthogonal increments, each connected with a merge as described above, and insight into a possible structure of the interaction may be obtained. As a stopping rule for the merging process, Corsten & Denis (1990) suggest a simultaneous test procedure, which provides a probability of stopping too early, i.e. of ending up with too many groups.

The data for the procedure is a table, specified by the TABLE parameter. Missing values are not allowed. You can also provide an estimate of σ², which is the common variance of the means in the table, together with its degrees of freedom by means of the options VARIANCE and DF.

Printed output is controlled by the PRINT option as follows. Setting PRINT=sortedtable prints a sorted table with increasing row and column means. PRINT=aovtable gives an analysis of variance which decomposes the total variation into that contributed by rows, columns and the interaction between rows and columns. If options VARIANCE and DF have been specified, setting PRINT=variance prints the estimate of the variance σ² together with its degrees of freedom. The effect of PRINT=monitoring depends on whether VARIANCE is specified or not. If VARIANCE is specified, PRINT=monitoring displays the sequence of merges starting just before the step in which the probability of stopping too early drops below the setting of the PRMONITOR option, the default setting of which is 0.95. Setting PRMONITOR=1 then displays the full sequence of merges. PRMONITOR is ignored if VARIANCE is not specified. PRINT=summary produces a summary of the clustering process giving, for each step, its number (in the column entitled step), the corresponding reduction of degrees of freedom due to the merging of the subtable (df), the mean square for interaction of the subtable which is merged (ms), the cumulated degrees of freedom (cumdf), the interaction sum of squares explained (cumSS), and (if options VARIANCE and DF have been set) the P-value of stopping too early (P). PRINT=amalgamations prints the amalgamation matrices. Finally, setting PRINT=dendrogram produces two dendrograms in a high-resolution graph, one for rows above the horizontal axis and one for columns below. The TITLE option can be used to supply a title for the dendrograms. By default all this information is printed.

The ROWAMALGAMATIONS and COLAMALGAMATIONS parameters may be used for saving the amalgamation matrices of rows and columns respectively, and contain information for drawing the dendrograms (as from directive HCLUSTER). The sorted table may be saved using the parameter SORTEDTABLE.

Saving the amalgamations matrices can be useful if you wish to modify the layout of the dendrogram after inspecting the results. To do this, you set the TABLE parameter as before, and set ROWAMALGAMATIONS and COLAMALGAMATIONS to the saved amalgamation matrices. Options SSTHRESHOLD and PENSIZE, and parameters ROWPERMUTATION and COLPERMUTATION, can then be used to control the layout of the dendrogram. By setting option SSTHRESHOLD to a specific value for the abscissa (i.e. cumSS) the grouping of rows and columns corresponding to this value is represented by a vertical dotted line in the dendrograms; by default value no line is drawn. The line is drawn in the default style of pen 3. You may wish to define a particular line style for this pen, since the appearance of lines is device specific. The groupings at this point can be saved using the ROWGROUPS and COLGROUPS parameters. Parameters ROWPERMUTATION and COLPERMUTATION specify the order of the labels in the row and column dendrograms, starting from the horizontal axis.

Options: PRINT, PRMONITOR, VARIANCE, DF, SSTHRESHOLD, TITLE, PENSIZE.

Parameters: TABLE, ROWAMALGAMATIONS, COLAMALGAMATIONS, ROWPERMUTATIONS, COLPERMUTATIONS, ROWGROUPS, COLGROUPS, SORTEDTABLE.

Method

In each step of the merging process the pair of rows or columns which contributes least to the total sum of squares for interaction is traced by repeated use of (weighted) ANOVA. When the number of data in each cell, which may have increased by previous merges, is properly accounted for, each of the successive sum of squares is the squared projection of the (tabulated) data vector on a subspace of interactions, orthogonal to all preceding such subspaces. In this way cumSS increases with hopefully minimal speed, but this is not guaranteed because of the sequential character of the procedure.

The P-values as presented in the summary table in Corsten & Denis (1990), are incorrect in that they apparently have been obtained by using D instead of n degrees of freedom in the denominator of the F-statistic concerned.

Action with `RESTRICT`

Restrictions are not allowed.

Reference

Corsten, L.C.A. and Denis, J.B. (1990). Structuring interaction in two-way tables by clustering. Biometrics, 46, 207-215.

Example

CAPTION  'CINTERACTION example',\
         !t('Data from Corsten and Denis (1990, Biometrics, 46, 207-215);'\,
         'also see Corsten (1996, Biometrical Letters, 33, 33-43) for',\
         'a definition of the correct column numbering.'); STYLE=meta,plain
TEXT     [VALUES=row1, row2, row3, row4, row5, row6, row7, row8, row9,\ 
         row10, row11, row12, row13, row14, row15, row16, row17, row18,\ 
         row19, row20] trow
TEXT     [VALUES=column1, column2, column3, column4, column5, column6,\ 
         column7] tcol
FACTOR   [LABELS=trow ; VALUES=7(1...20)] row
FACTOR   [LABELS=tcol ; VALUES=(1...7)20] col
TABLE    [CLASSIFICATION=row, col] table
READ     table
  59.8  61.0  75.6  58.8  64.4  62.7  53.4
  64.5  70.3  81.5  60.4  73.2  78.3  70.8
  59.5  68.2  79.9  60.0  72.3  76.9  71.5
  65.1  71.9  82.1  58.5  71.9  83.2  71.5
  64.2  68.2  81.2  60.1  74.2  85.4  78.4
  56.4  65.2  79.6  58.0  68.0  80.6  73.0
  63.5  65.6  74.3  60.7  71.6  73.2  69.4
  58.3  65.7  72.7  60.0  74.6  73.7  66.1
  61.9  66.1  83.4  62.9  74.5  75.6  74.4
  58.9  64.8  80.9  67.7  71.5  72.0  70.0
  57.2  64.1  81.2  56.2  75.4  72.4  65.7
  58.0  66.1  85.5  62.2  74.5  82.0  70.0
  62.0  71.8  69.1  62.8  71.4  77.0  75.6
  51.6  62.5  53.7  55.7  59.6  71.8  67.3
  62.9  64.8  67.2  61.2  64.5  77.9  65.5
  60.2  63.2  73.1  60.1  74.9  86.0  71.7
  55.4  63.3  66.9  58.3  73.8  76.9  65.0
  53.7  68.1  71.5  64.1  76.7  90.2  72.5
  54.5  67.3  76.7  60.2  82.2  81.0  73.1
  56.1  59.4  76.3  62.5  70.4  85.9  65.4 :
CINTERACTION  [PRMONITOR=0.9; VARIANCE=8.59; DF=266] table

Updated on March 8, 2019

Was this article helpful?

Yes No