Estimates missing values for units in a multivariate data set (H.R. Simpson & R.P. White).

### Option

`MAXCYCLE` = scalar |
Defines the maximum allowed number of iterations; default 10 |
---|

### Parameters

`DATA` = pointers |
Each pointer contains a set of variates whose missing values are to be estimated; these will be overwritten by the estimates unless the `OUT` parameter is specified |
---|---|

`OUT` = pointers |
Each pointer contains a set of variates to hold the results |

### Description

`MULTMISSING`

estimates missing values for units in a multivariate data set, using an iterative regression technique. The input for the procedure is a set of variates contained in a pointer specified by the `DATA`

parameter. The output can be saved in a different set of variates by supplying a similar pointer with the parameter `OUT`

; if this is absent, the output values will overwrite the values of the variates given by `DATA`

. The maximum number of iterations is set by the option `MAXCYCLE`

, with a default of 10. If `MAXCYCLE`

is set to zero, missing values will be replaced by variate means calculated from the units that have no values missing for any of the variates.

Option: `MAXCYCLE`

.

Parameters: `DATA`

, `OUT`

.

### Method

Initial estimates of the missing values in each variate are formed from the variate means using the values for units that have no missing values for any variate. Estimates of the missing values for each variate are then recalculated as the fitted values from the multiple regression of that variate on all the other variates. When all the missing values have been estimated the variate means are recalculated. If any of the means differs from the previous mean by more than a tolerance (the initial standard error divided by 1000) the process is repeated, subject to a maximum number of repetitions defined by the `MAXCYCLE`

option.

The default maximum number of iterations (10) is usually sufficient when there are few missing values, say two or three. If there are many more, 20 or so, it may be necessary to increase the maximum number of iterations to around 30.

The method is similar to that of Orchard & Woodbury (1972), but does not adjust for bias in the variance-covariance matrix as suggested by Beale & Little (1975).

### Action with `RESTRICT`

All the variates must be unrestricted, or they must all be restricted to the same set of units; otherwise a fault will occur in a `CALCULATE`

statement within `MULTMISSING`

.

### References

Beale, E.M.L. & Little, R.J.A. (1975). Missing values in multivariate analysis. *Journal of the Royal Statistical Society, Series B*, 37, 129-145.

Orchard, T. & Woodbury, M.A. (1972). A missing information principle: theory and applications. In: *Proceedings of the 6th Berkeley Symposium in Mathematical Statistics and Probability, Vol I*, 697-715.

### See also

Directive: `INTERPOLATE`

.

Procedures: `ANTMVESTIMATE`

, `SVHOTDECK`

, `QMVREPLACE`

.

Commands for: Multivariate and cluster analysis, Calculations and manipulation.

### Example

CAPTION 'MULTMISSING example',\ 'There are three variates, two having one missing value each.';\ STYLE=meta,plain VARIATE V[1...3]; VALUES=!(1,2,5,6,4),!(2,*,6,8,6),!(3,4,7,*,8) PRINT V[]; FIELDWIDTH=8; DECIMALS=2 MULTMISSING V PRINT V[]; FIELDWIDTH=8; DECIMALS=2