MULTMISSING procedure

Estimates missing values for units in a multivariate data set (H.R. Simpson & R.P. White).

Option

MAXCYCLE = scalar Defines the maximum allowed number of iterations; default 10

Parameters

DATA = pointers Each pointer contains a set of variates whose missing values are to be estimated; these will be overwritten by the estimates unless the OUT parameter is specified
OUT = pointers Each pointer contains a set of variates to hold the results

Description

MULTMISSING estimates missing values for units in a multivariate data set, using an iterative regression technique. The input for the procedure is a set of variates contained in a pointer specified by the DATA parameter. The output can be saved in a different set of variates by supplying a similar pointer with the parameter OUT; if this is absent, the output values will overwrite the values of the variates given by DATA. The maximum number of iterations is set by the option MAXCYCLE, with a default of 10. If MAXCYCLE is set to zero, missing values will be replaced by variate means calculated from the units that have no values missing for any of the variates.

Option: MAXCYCLE.

Parameters: DATA, OUT.

Method

Initial estimates of the missing values in each variate are formed from the variate means using the values for units that have no missing values for any variate. Estimates of the missing values for each variate are then recalculated as the fitted values from the multiple regression of that variate on all the other variates. When all the missing values have been estimated the variate means are recalculated. If any of the means differs from the previous mean by more than a tolerance (the initial standard error divided by 1000) the process is repeated, subject to a maximum number of repetitions defined by the MAXCYCLE option.

The default maximum number of iterations (10) is usually sufficient when there are few missing values, say two or three. If there are many more, 20 or so, it may be necessary to increase the maximum number of iterations to around 30.

The method is similar to that of Orchard & Woodbury (1972), but does not adjust for bias in the variance-covariance matrix as suggested by Beale & Little (1975).

Action with RESTRICT

All the variates must be unrestricted, or they must all be restricted to the same set of units; otherwise a fault will occur in a CALCULATE statement within MULTMISSING.

References

Beale, E.M.L. & Little, R.J.A. (1975). Missing values in multivariate analysis. Journal of the Royal Statistical Society, Series B, 37, 129-145.

Orchard, T. & Woodbury, M.A. (1972). A missing information principle: theory and applications. In: Proceedings of the 6th Berkeley Symposium in Mathematical Statistics and Probability, Vol I, 697-715.

See also

Directive: INTERPOLATE.

Procedures: ANTMVESTIMATE, SVHOTDECK, QMVREPLACE.

Commands for: Multivariate and cluster analysis, Calculations and manipulation.

Example

CAPTION     'MULTMISSING example',\ 
            'There are three variates, two having one missing value each.';\
            STYLE=meta,plain
VARIATE     V[1...3]; VALUES=!(1,2,5,6,4),!(2,*,6,8,6),!(3,4,7,*,8)
PRINT       V[]; FIELDWIDTH=8; DECIMALS=2
MULTMISSING V
PRINT       V[]; FIELDWIDTH=8; DECIMALS=2
Updated on March 7, 2019

Was this article helpful?