Forms robust estimates of sum-of-squares-and-products matrices (P.G.N. Digby).
Options
PRINT = string tokens |
Controls printed output (sspm , distances , weights , vcovariance , means , correlations , outliers ); default * i.e. no output |
---|---|
B1 = scalar |
The value from which the threshold distance is derived (see the Method Section); default 2 |
B2 = scalar |
The value indicating the decline in weight as the distance of a unit above the threshold increases, (see the Method Section); default 1.25 |
MAXCYCLE = scalar |
Maximum number of iterations; default 100 |
TOLERANCE = scalar |
The minimum change in the average squared-weight that has to be achieved for the iterative process to converge; default 1.0-8 |
Parameters
DATA = pointers |
Supplies the set of variates in each datamatrix |
---|---|
SSPM = SSPMs |
SSPM structure to contain the robust estimates of the sums of squares and products, the robust estimates of the means, and the sum of the weights for each datamatrix |
DISTANCES = variates |
To contain the Mahalanobis distances of the units from the mean |
WEIGHTS = variates |
To contain the weights used for each unit when forming the robust estimates |
VCOVARIANCE = symmetric matrices |
To contain the robust estimates of the matrices of variances and covariances |
CORRELATIONS = symmetric matrices |
This contains on output the correlations from the robust estimates of the variances and covariances |
Description
ROBSSPM
forms robust estimates of SSPMs, and the related variance-covariance and correlation matrices, using the method of Campbell (1980). This weights the units differentially so that those that are extreme, in a multivariate sense, contribute less to the calculated means and sums of squares and products. The extremeness of a unit is judged by its Mahalanobis distance from the estimated mean.
The input variates are specified, in a pointer, by the DATA
parameter. They may be restricted or may contain some missing values, in which case the units concerned will be ignored.
Output is controlled by the PRINT
option, with settings: sspm
prints the estimated sums-of-squares-and-products, the estimated means, and the sum of the weights; distances
prints the Mahalanobis distances for all the units, including any excluded by restrictions; weights
prints the weights for all the units; vcovariance
prints the estimated variance-covariance matrix; means
prints the estimated means; correlations
prints correlations derived from the variance-covariance matrix; outliers
prints unit numbers, weights, and distances for outliers. By default there is no printed output.
If the outliers, weights or distances are to be printed then an appropriate summary of the number of units, number of outliers and so on will be printed too. The outlier information consists of the unit numbers, weights and Mahalanobis distances, printed across the page.
The weight given to each unit in forming the robust estimates is one if the unit’s Mahalanobis distance from the mean is less than some threshold distance, and it decreases as the Mahalanobis distance increases above that threshold. The threshold and the form of the decrease in weight are controlled by options B1
and B2
, which correspond to the corresponding quantities in the functions used by Campbell (1980), as explained in the Methods Section. By default, B1
=2 and B2
=1.25.
The estimation process is iterative, with the maximum number of iterations controlled by the MAXCYCLE
option (default 100). It converges when the average change in the weights is less than some tolerance. The default tolerance is 1.0-8, but this can be redefined by the TOLERANCE
option. Lack of convergence usually indicates some problem with the data, perhaps that the threshold has been set too low.
Parameters SSPM
, DISTANCES
, WEIGHTS
, VCOVARIANCE
and CORRELATIONS
allow the various components of the output to be saved.
Options: PRINT
, B1
, B2
, MAXCYCLE
, TOLERANCE
.
Parameters: DATA
, SSPM
, DISTANCES
, WEIGHTS
, VCOVARIANCE
, CORRELATIONS
.
Method
Initial (unweighted) estimates of the means and sums of squares and products are formed from all the units, subject to any restriction on the data and excluding any units with missing values for any of the variates. From the estimates, Mahalanobis distances of the units from their means are calculated, and used to determine the weights for the units. The weights are then used to reform the SSPM structure, new distances are calculated, and so on. Convergence occurs when the average change in the derived weights is less than the defined tolerance.
The weight w of each unit is given by
w = 1 d ≤ t
W = (t/d) × exp( -0.5 × (d–t)2 / B2
2 ) d > t
where t, the threshold distance, is given by
t = √ v + B1
/ √ 2
and v is the number of means.
As explained by Campbell (1980), under Fisher’s square root approximation, B1
equates to a percentage point of the standard Gaussian distribution.
Campbell (1980) regards three possibilities as potentially most useful. If B1
is infinite, the usual (non-robust) estimates are obtained. With B1
=2 and B2
infinite, the weight decreases inversely with distance (w=t/d); this can be obtained in the procedure by setting B2
to a missing value. Finally, there is the combination used as a default by ROBSSPM
, namely B1
=2 and B2
=1.25.
Action with RESTRICT
If the DATA
variates are restricted only the units not excluded by the restriction will be used in the estimation process. However, Mahalanobis distances will be formed for all units other than those where any of the variates is missing.
Reference
Campbell, N.A. (1980). Robust procedures in multivariate analysis I: robust covariance estimation. Applied Statistics, 29, 231-237.
See also
Directive: FSSPM
.
Procedures: FVCOVARIANCE
, MPOLISH
, TUKEYBIWEIGHT
.
Commands for: Calculations and manipulation, Multivariate and cluster analysis.
Example
CAPTION 'ROBSSPM example'; STYLE=meta POINTER [NVALUES=4] X VARIATE [NVALUES=18] X[] READ X[1...4] 1 81 9 98 78 102 116 78 89 65 125 101 93 100 30 244 127 90 117 104 87 74 75 77 64 41 26 5 75 56 92 72 133 70 130 71 85 35 108 57 44 97 61 145 35 153 52 141 96 49 111 34 131 108 132 115 114 28 132 52 95 89 78 121 118 90 114 88 123 10 197 25 : ROBSSPM [PRINT=sspm,outliers] X