Estimates missing values for units in a multivariate data set (H.R. Simpson & R.P. White).
Option
MAXCYCLE = scalar |
Defines the maximum allowed number of iterations; default 10 |
---|
Parameters
DATA = pointers |
Each pointer contains a set of variates whose missing values are to be estimated; these will be overwritten by the estimates unless the OUT parameter is specified |
---|---|
OUT = pointers |
Each pointer contains a set of variates to hold the results |
Description
MULTMISSING
estimates missing values for units in a multivariate data set, using an iterative regression technique. The input for the procedure is a set of variates contained in a pointer specified by the DATA
parameter. The output can be saved in a different set of variates by supplying a similar pointer with the parameter OUT
; if this is absent, the output values will overwrite the values of the variates given by DATA
. The maximum number of iterations is set by the option MAXCYCLE
, with a default of 10. If MAXCYCLE
is set to zero, missing values will be replaced by variate means calculated from the units that have no values missing for any of the variates.
Option: MAXCYCLE
.
Parameters: DATA
, OUT
.
Method
Initial estimates of the missing values in each variate are formed from the variate means using the values for units that have no missing values for any variate. Estimates of the missing values for each variate are then recalculated as the fitted values from the multiple regression of that variate on all the other variates. When all the missing values have been estimated the variate means are recalculated. If any of the means differs from the previous mean by more than a tolerance (the initial standard error divided by 1000) the process is repeated, subject to a maximum number of repetitions defined by the MAXCYCLE
option.
The default maximum number of iterations (10) is usually sufficient when there are few missing values, say two or three. If there are many more, 20 or so, it may be necessary to increase the maximum number of iterations to around 30.
The method is similar to that of Orchard & Woodbury (1972), but does not adjust for bias in the variance-covariance matrix as suggested by Beale & Little (1975).
Action with RESTRICT
All the variates must be unrestricted, or they must all be restricted to the same set of units; otherwise a fault will occur in a CALCULATE
statement within MULTMISSING
.
References
Beale, E.M.L. & Little, R.J.A. (1975). Missing values in multivariate analysis. Journal of the Royal Statistical Society, Series B, 37, 129-145.
Orchard, T. & Woodbury, M.A. (1972). A missing information principle: theory and applications. In: Proceedings of the 6th Berkeley Symposium in Mathematical Statistics and Probability, Vol I, 697-715.
See also
Directive: INTERPOLATE
.
Procedures: ANTMVESTIMATE
, SVHOTDECK
, QMVREPLACE
.
Commands for: Multivariate and cluster analysis, Calculations and manipulation.
Example
CAPTION 'MULTMISSING example',\ 'There are three variates, two having one missing value each.';\ STYLE=meta,plain VARIATE V[1...3]; VALUES=!(1,2,5,6,4),!(2,*,6,8,6),!(3,4,7,*,8) PRINT V[]; FIELDWIDTH=8; DECIMALS=2 MULTMISSING V PRINT V[]; FIELDWIDTH=8; DECIMALS=2