Forms robust estimates of sum-of-squares-and-products matrices (P.G.N. Digby).
|Controls printed output (
||The value from which the threshold distance is derived (see the Method Section); default 2|
||The value indicating the decline in weight as the distance of a unit above the threshold increases, (see the Method Section); default 1.25|
||Maximum number of iterations; default 100|
||The minimum change in the average squared-weight that has to be achieved for the iterative process to converge; default 1.0-8|
||Supplies the set of variates in each datamatrix|
||SSPM structure to contain the robust estimates of the sums of squares and products, the robust estimates of the means, and the sum of the weights for each datamatrix|
||To contain the Mahalanobis distances of the units from the mean|
||To contain the weights used for each unit when forming the robust estimates|
||To contain the robust estimates of the matrices of variances and covariances|
||This contains on output the correlations from the robust estimates of the variances and covariances|
ROBSSPM forms robust estimates of SSPMs, and the related variance-covariance and correlation matrices, using the method of Campbell (1980). This weights the units differentially so that those that are extreme, in a multivariate sense, contribute less to the calculated means and sums of squares and products. The extremeness of a unit is judged by its Mahalanobis distance from the estimated mean.
The input variates are specified, in a pointer, by the
DATA parameter. They may be restricted or may contain some missing values, in which case the units concerned will be ignored.
Output is controlled by the
sspm prints the estimated sums-of-squares-and-products, the estimated means, and the sum of the weights;
distances prints the Mahalanobis distances for all the units, including any excluded by restrictions;
weights prints the weights for all the units;
vcovariance prints the estimated variance-covariance matrix;
means prints the estimated means;
correlations prints correlations derived from the variance-covariance matrix;
outliers prints unit numbers, weights, and distances for outliers. By default there is no printed output.
If the outliers, weights or distances are to be printed then an appropriate summary of the number of units, number of outliers and so on will be printed too. The outlier information consists of the unit numbers, weights and Mahalanobis distances, printed across the page.
The weight given to each unit in forming the robust estimates is one if the unit’s Mahalanobis distance from the mean is less than some threshold distance, and it decreases as the Mahalanobis distance increases above that threshold. The threshold and the form of the decrease in weight are controlled by options
B2, which correspond to the corresponding quantities in the functions used by Campbell (1980), as explained in the Methods Section. By default,
The estimation process is iterative, with the maximum number of iterations controlled by the
MAXCYCLE option (default 100). It converges when the average change in the weights is less than some tolerance. The default tolerance is 1.0-8, but this can be redefined by the
TOLERANCE option. Lack of convergence usually indicates some problem with the data, perhaps that the threshold has been set too low.
CORRELATIONS allow the various components of the output to be saved.
Initial (unweighted) estimates of the means and sums of squares and products are formed from all the units, subject to any restriction on the data and excluding any units with missing values for any of the variates. From the estimates, Mahalanobis distances of the units from their means are calculated, and used to determine the weights for the units. The weights are then used to reform the SSPM structure, new distances are calculated, and so on. Convergence occurs when the average change in the derived weights is less than the defined tolerance.
The weight w of each unit is given by
w = 1 d ≤ t
W = (t/d) × exp( -0.5 × (d–t)2 /
B22 ) d > t
where t, the threshold distance, is given by
t = √ v +
B1 / √ 2
and v is the number of means.
As explained by Campbell (1980), under Fisher’s square root approximation,
B1 equates to a percentage point of the standard Gaussian distribution.
Campbell (1980) regards three possibilities as potentially most useful. If
B1 is infinite, the usual (non-robust) estimates are obtained. With
B2 infinite, the weight decreases inversely with distance (w=t/d); this can be obtained in the procedure by setting
B2 to a missing value. Finally, there is the combination used as a default by
DATA variates are restricted only the units not excluded by the restriction will be used in the estimation process. However, Mahalanobis distances will be formed for all units other than those where any of the variates is missing.
Campbell, N.A. (1980). Robust procedures in multivariate analysis I: robust covariance estimation. Applied Statistics, 29, 231-237.
CAPTION 'ROBSSPM example'; STYLE=meta POINTER [NVALUES=4] X VARIATE [NVALUES=18] X READ X[1...4] 1 81 9 98 78 102 116 78 89 65 125 101 93 100 30 244 127 90 117 104 87 74 75 77 64 41 26 5 75 56 92 72 133 70 130 71 85 35 108 57 44 97 61 145 35 153 52 141 96 49 111 34 131 108 132 115 114 28 132 52 95 89 78 121 118 90 114 88 123 10 197 25 : ROBSSPM [PRINT=sspm,outliers] X