ROBSSPM procedure

Forms robust estimates of sum-of-squares-and-products matrices (P.G.N. Digby).

Options

`PRINT` = string tokens	Controls printed output (`sspm`, `distances`, `weights`, `vcovariance`, `means`, `correlations`, `outliers`); default `*` i.e. no output
`B1` = scalar	The value from which the threshold distance is derived (see the Method Section); default 2
`B2` = scalar	The value indicating the decline in weight as the distance of a unit above the threshold increases, (see the Method Section); default 1.25
`MAXCYCLE` = scalar	Maximum number of iterations; default 100
`TOLERANCE` = scalar	The minimum change in the average squared-weight that has to be achieved for the iterative process to converge; default 1.0^-8

Parameters

`DATA` = pointers	Supplies the set of variates in each datamatrix
`SSPM` = SSPMs	SSPM structure to contain the robust estimates of the sums of squares and products, the robust estimates of the means, and the sum of the weights for each datamatrix
`DISTANCES` = variates	To contain the Mahalanobis distances of the units from the mean
`WEIGHTS` = variates	To contain the weights used for each unit when forming the robust estimates
`VCOVARIANCE` = symmetric matrices	To contain the robust estimates of the matrices of variances and covariances
`CORRELATIONS` = symmetric matrices	This contains on output the correlations from the robust estimates of the variances and covariances

Description

ROBSSPM forms robust estimates of SSPMs, and the related variance-covariance and correlation matrices, using the method of Campbell (1980). This weights the units differentially so that those that are extreme, in a multivariate sense, contribute less to the calculated means and sums of squares and products. The extremeness of a unit is judged by its Mahalanobis distance from the estimated mean.

The input variates are specified, in a pointer, by the DATA parameter. They may be restricted or may contain some missing values, in which case the units concerned will be ignored.

Output is controlled by the PRINT option, with settings: sspm prints the estimated sums-of-squares-and-products, the estimated means, and the sum of the weights; distances prints the Mahalanobis distances for all the units, including any excluded by restrictions; weights prints the weights for all the units; vcovariance prints the estimated variance-covariance matrix; means prints the estimated means; correlations prints correlations derived from the variance-covariance matrix; outliers prints unit numbers, weights, and distances for outliers. By default there is no printed output.

If the outliers, weights or distances are to be printed then an appropriate summary of the number of units, number of outliers and so on will be printed too. The outlier information consists of the unit numbers, weights and Mahalanobis distances, printed across the page.

The weight given to each unit in forming the robust estimates is one if the unit’s Mahalanobis distance from the mean is less than some threshold distance, and it decreases as the Mahalanobis distance increases above that threshold. The threshold and the form of the decrease in weight are controlled by options B1 and B2, which correspond to the corresponding quantities in the functions used by Campbell (1980), as explained in the Methods Section. By default, B1=2 and B2=1.25.

The estimation process is iterative, with the maximum number of iterations controlled by the MAXCYCLE option (default 100). It converges when the average change in the weights is less than some tolerance. The default tolerance is 1.0^-8, but this can be redefined by the TOLERANCE option. Lack of convergence usually indicates some problem with the data, perhaps that the threshold has been set too low.

Parameters SSPM, DISTANCES, WEIGHTS, VCOVARIANCE and CORRELATIONS allow the various components of the output to be saved.

Options: PRINT, B1, B2, MAXCYCLE, TOLERANCE.

Parameters: DATA, SSPM, DISTANCES, WEIGHTS, VCOVARIANCE, CORRELATIONS.

Method

Initial (unweighted) estimates of the means and sums of squares and products are formed from all the units, subject to any restriction on the data and excluding any units with missing values for any of the variates. From the estimates, Mahalanobis distances of the units from their means are calculated, and used to determine the weights for the units. The weights are then used to reform the SSPM structure, new distances are calculated, and so on. Convergence occurs when the average change in the derived weights is less than the defined tolerance.

The weight w of each unit is given by

w = 1 d ≤ t
W = (t/d) × exp( -0.5 × (d–t)² / B2² ) d > t

where t, the threshold distance, is given by

t = √ v + B1 / √ 2

and v is the number of means.

As explained by Campbell (1980), under Fisher’s square root approximation, B1 equates to a percentage point of the standard Gaussian distribution.

Campbell (1980) regards three possibilities as potentially most useful. If B1 is infinite, the usual (non-robust) estimates are obtained. With B1=2 and B2 infinite, the weight decreases inversely with distance (w=t/d); this can be obtained in the procedure by setting B2 to a missing value. Finally, there is the combination used as a default by ROBSSPM, namely B1=2 and B2=1.25.

Action with `RESTRICT`

If the DATA variates are restricted only the units not excluded by the restriction will be used in the estimation process. However, Mahalanobis distances will be formed for all units other than those where any of the variates is missing.

Reference

Campbell, N.A. (1980). Robust procedures in multivariate analysis I: robust covariance estimation. Applied Statistics, 29, 231-237.

Example

CAPTION 'ROBSSPM example'; STYLE=meta
POINTER [NVALUES=4] X
VARIATE [NVALUES=18] X[]
READ    X[1...4]
   1  81   9  98    78 102 116  78    89  65 125 101    93 100  30 244
 127  90 117 104    87  74  75  77    64  41  26   5    75  56  92  72
 133  70 130  71    85  35 108  57    44  97  61 145    35 153  52 141
  96  49 111  34   131 108 132 115   114  28 132  52    95  89  78 121
 118  90 114  88   123  10 197  25 :
ROBSSPM [PRINT=sspm,outliers] X

Updated on March 11, 2022

Was this article helpful?

Yes No