1. Home
  2. MEDIANTETRAD procedure

MEDIANTETRAD procedure

Gives robust identification of multiple outliers in 2-way tables (J.K.M. Brown).

Options

PRINT = string tokens Printed output required (graph, table); default grap, tabl
GRAPHICS = string tokens Type of graph required (highresolution, lineprinter); default high
SORT = string tokens Sorting of printed output, in order of absolute value of median tetrad (ascending, descending, none); default none

Parameters

TABLE = tables Specifies the two-way table of data
ROWS = factors Saves the factor classifying the table rows
COLUMNS = factors Saves the factor classifying the table columns
DATA = variates Saves the data values in the body of the table
MEDIANTETRADS = variates Saves median tetrads for each cell in the table
RANKS = variates Saves ranks of absolute values of median tetrads
HALFNORMALSCORES = variates Saves half-Normal scores of absolute values of median tetrads
TESTOUTLIERS = scalars Specifies the number of cells, with the highest absolute median tetrads, to be set to their predicted values before re-running the analysis

Description

In a table of data cross-classified by two factors, some cells may be outliers, in that they contain values substantially higher or lower than those expected from the means of the relevant rows and columns. Median tetrad analysis is a robust, single-step method of identifying several outliers in a two-way table (Bradu & Hawkins 1982).

A tetrad is calculated from four cells which form a square in the body of the table. For instance, if the cell in row i and column j has a value cij, the tetrad involving that cell and the cell in row p and column q is defined as

tij; pq = cijciqcpj + cpq

In a clean tetrad, none of the values ciq, cpj or cpq are themselves outliers, so the tetrad is an estimate of the amount by which cij deviates from its expected value. In a contaminated tetrad, one of more of ciq, cpj or cpq are outliers, so a contaminated tetrad is not a reliable estimate of the deviation of cij from its expectation.

MEDIANTETRAD calculates the median of all the tetrads involving each cell of the table (such that ip and jq, so the four cells in the tetrad form a square). These median tetrads are robust estimates of the deviations for each cell and therefore indicate which cells may contain outliers. The method is robust because the median will be a clean tetrad (and therefore a reliable estimate of the deviation) so long as fewer than half the tetrads involving that cell are contaminated. Furthermore, the robustness of the method allows several outliers to be detected reliably in a single step; other methods of detecting outliers may detect only a single outlier, or may require several steps, one for each outlier.

The options of MEDIANTETRAD control the output. PRINT has two settings. The graph setting produces a plot of half-Normal scores of the median tetrads against the absolute values of the median tetrads. In the half-Normal plot, inliers (values for cells which are not outliers, with low deviations) fall on a straight line passing through the origin, while outliers (with high deviations) fall at the upper end of this line and below the level of the line. A regression line, passing through the origin, of half-Normal scores against absolute values of median tetrads, is also plotted. The setting table prints the factors which classify the table, the data in the body of the table, the median tetrads, the ranks of the absolute values of the median tetrads and the half-Normal scores. The GRAPHICS option controls graphical output, as a high-resolution plot (the default setting) or as a line-printer plot. The SORT option controls whether the output provided by setting PRINT=table is sorted in ascending order (most extreme median tetrad last), descending order, or not at all.

The TABLE parameter specifies a table, classified by two factors, in which outliers are to be identified. The table may contain missing values, in which case the corresponding median tetrad is returned as a missing value. The TABLE parameter must be set, while the other parameters are optional. The next six parameters save output. ROWS and COLUMNS save the factors which classify the table, DATA saves the numerical body of the table, and MEDIANTETRADS, RANKS and HALFNORMALSCORES save the median tetrads, their ranks and half-Normal scores respectively.

When a table has few rows (or, equivalently, few columns), a large outlier in the cell in row i and column j may cause other cells in column j to appear to be moderately outlying. This is bound to be a problem if the table has only two or three rows, in which case 100% or at least 50%, respectively, of tetrads involving cells in column j will be contaminated, so the median tetrads of those cells will be contaminated. The presence of missing values may also cause this problem to occur in larger tables, by reducing the proportion of clean tetrads. The parameter TESTOUTLIERS can be used to examine the influence of suspected outliers on the deviations of other cells. When TESTOUTLIERS is set to a positive integer (m), the analysis is run twice. In the first run, the data used is that supplied in TABLE. In the second run, the cells with the highest m absolute median tetrads are set to values estimated from the remainder of the data (i.e. those not suspected to be outliers). If these m values are indeed the only notable outliers, all the data will now be inliers, so the half-Normal plot of the median tetrads will be a close fit to a straight line passing through the origin. Note that, if TESTOUTLIERS is set, the output saved in the variates set by the DATA, MEDIANTETRADS, RANKS and HALFNORMALSCORES parameters will be from the second analysis, that of the modified table. If the option GRAPHICS=highresolution is set in combination with a non-zero value of TESTOUTLIERS, you may need to set the option “Multiple Windows” in the Windows version of Genstat Graphics in order to see the two graphs, before and after adjustment of the suspected outliers.

Options: PRINT, GRAPHICS, SORT.

Parameters: TABLE, ROWS, COLUMNS, DATA, MEDIANTETRADS, RANKS, HALFNORMALSCORES, TESTOUTLIERS.

Method

All proper tetrads are calculated for each cell and their median is calculated. The median tetrad for a cell with a missing value is set to a missing value. The absolute values of the median tetrads are then ranked and their half-Normal scores calculated, as described in the Procedure Library Manual for APLOT. If TESTOUTLIERS is set to an integer m>0, the cells with the highest m outliers are set to missing values, an analysis of variance (anova) is carried out with treatmentstructure ROWS + COLUMNS (i.e. no interaction term is fitted), then the m cells with suspected outliers are given the appropriate fitted value saved from that anova.

References

Bradu, D. & Hawkins, D.M. (1982). Location of multiple outliers in two-way tables, using tetrads. Technometrics, 24, 103-108.

See also

Directive: TABULATE.

Procedure: DRESIDUALS, RCHECK.

Example

CAPTION  'MEDIANTETRAD example',\ 
         !t('Data from Bradu & Hawkins 1982, Table 1. Prevalence rates of',\ 
         'men of various occupations with hearing levels 16 dB or more',\ 
         'above the audiometric zero at various frequencies. (There are',\ 
         '3 suspected outliers.)'); STYLE=meta,plain
FACTOR   [NVALUES=49; LEVELS=7; LABELS=!t(Professionl,Farm,Clerical,\ 
         Craftsman,Operative,Service,Labourer)] Occupation
&        [LABEL=!t('500 Hz','1000 Hz','2000 Hz','3000 Hz',\ 
                   '4000 Hz','6000 Hz','Nrml speech')] Frequency
GENERATE Frequency,Occupation
TABLE    [CLASSIFICATION=Frequency,Occupation] HearTable; VALUES=!(\ 
         2.1, 6.8, 8.4, 1.4,14.6, 7.9, 4.8, 1.7, 8.1, 8.4, 1.4,12.0, 3.7,\
         4.5,14.4,14.8,27.0,30.9,36.5,36.4,31.4,57.4,62.4,37.4,63.3,65.5,\
         65.6,59.8,66.2,81.7,53.3,80.7,79.7,80.8,82.4,75.2,94.0,74.5,87.9,\
         93.3,87.8,80.5, 4.1,10.2,10.7, 5.5,18.1,11.4, 6.1)
MEDIANTETRAD [PRINT=graph,table; SORT=descending] HearTable; ROWS=Freq;\ 
         COLUMNS=Occup; DATA=Hearing; TEST=3
Updated on March 7, 2019

Was this article helpful?