1. Home
2. AUPREDICT procedure

# AUPREDICT procedure

Forms predictions from an unbalanced analysis of variance, performed by `AUNBALANCED` (R.W. Payne).

### Options

`PRINT` = string tokens What to print (`description`, `predictions`, `se`, `sed`, `sedsummary`, `ese`, `lsd`, `lsdsummary`, `vcovariance`); default `pred`, `sed` Model to use to calculate the predictions; default * i.e. full model fitted by `AUNBALANCED` Limit on number of factors or variates in each term specified by `MODEL`; default 3 Factor combinations for which to form predicted means (`present`, `estimable`); default `esti` Type of adjustment to be made when predicting means (`marginal`, `equal`, `observed`); default `marg` Weights classified by some or all of the factors in the model Saves predictions; default `*` Saves standard errors of predictions; default `*` Saves matrices of standard errors of differences between predictions; default `*` Saves effective standard errors; default `*` Saves least significant differences between predictions; default `*` Significance level (%) for least significant differences; default 5 Saves variance-covariance matrices of predictions; default `*` Save structure (from `AUNBALANCED`) containing details of the analysis for which predictions are required; if omitted, output is from the most recent use of `AUNBALANCED`

### Parameters

`CLASSIFY` = vectors Variates and/or factors to classify table of predictions To specify values of variates, levels of factors

### Description

`AUPREDICT` can produce predicted means following an analysis of variance of an unbalanced design by `AUNBALANCED`. The predictions are calculated using the `PREDICT` directive. The first step (A) of the calculation forms a full table of predictions, classified by every factor in the model. The second step (B) averages the full table over the factors that do not occur in the `table of means. `The `COMBINATIONS` option specifies which cells of the full table are to be formed in Step A. The default setting, `estimable`, fills in all the cells other than those that involve parameters that cannot be estimated, for example because of aliasing. Alternatively, setting `COMBINATIONS=present` excludes the cells for factor combinations that do not occur in the data. The `ADJUSTMENT` and `WEIGHTS` options then define how the averaging is done in Step B. The `WEIGHTS` option allows you to specify your own table of weights to use in the averaging. Alternatively, if `WEIGHTS` is not set, the weights are formed automatically according to the setting of the `ADJUSTMENT` option. The default setting, `marginal`, of `ADJUSTMENT` forms a table of marginal weights for each factor, containing the proportion of observations with each of its levels; the full table of weights is then formed from the product of the marginal tables. The setting `equal` weights all the combinations equally. Finally, the setting `observed` uses the `WEIGHTS` option of `PREDICT` to weight each factor combination according to its own individual replication in the data.

Printed output, which extends the output available from `PREDICT`, is controlled by settings of the `PRINT` option:

    `description` standardization policies used when forming the predictions, predictions, predictions and standard errors, standard errors for differences between the predictions, summary of the standard errors for differences between the predictions, least significant differences between the predictions, summary of the least significant differences between the predictions, approximate effective standard errors – these are formed by procedure `SED2ESE` with the aim of allowing good approximations to the standard errors for differences to be calculated by the usual formula of sedi,j = √( esei2 + esej2 ), and variance and covariances of the predictions.

The default is to print predictions and a summary of the standard errors of differences. The standard errors (and sed’s) are relevant for the predictions when considered as means of those data that have been analysed, with the means formed according to the averaging policy defined by the options of `PREDICT`. The word prediction is used because these are predictions of what the means would have been if the factor levels been replicated differently in the data; see Lane & Nelder (1982) for more details. The `LSDLEVEL` option specifies the significance level (%) to use in the calculation of least significant differences (default 5%).

Another extension in `AUPREDICT` is that you can produce predictions using a smaller model than the full model that has been fitted by `AUNBALANCED`. This can be useful if the full model contains many parameters. A substantial amount of time and computer workspace may then be needed to calculate the predictions and standard errors. Very large models may even exceed the capacity of some PCs.

You might choose to omit a term from the full model when forming a particular table of predictions if the term is orthogonal to all the terms involved in the table. For example, you might omit the term `blocks` when forming an `A`-by-`B` table of predictions if each combination of levels of the factors `A` and `B` is replicated the same number of times in every block. The justification is that an orthogonal term cannot affect the size of any of the differences between predictions. Different weighting of the levels of the orthogonal term may affect the overall mean of the predictions, but this is usually unimportant. If you omit the term, it is though you had included it with weightings based on the observed replication of its levels in the data set – and in any well-designed data set these should provide a satisfactory outcome. You might also omit a term if it is nearly orthogonal to the terms involved in the table, and you are happy to ignore its effect on the predictions.

The model is specified by the `MODEL` option. The `FACTORIAL` option sets a limit on number of factors or variates in each term specified by `MODEL`; default 3.

The `PREDICTIONS`, `SE`, `SED`, `ESE`, `LSD` and `VCOVARIANCE` options allow the results of the prediction to be save in appropriate Genstat data structures.

The `SAVE` option allows you to specify save structure from the analysis for which further output is required. If `SAVE` is not set, output will be produced for the most recent analysis from `AUNBALANCED`; however, none of the Genstat regression directives (`MODEL`, `TERMS`, `FIT`, `ADD`, `DROP` and so on) must then have been used in the interim.

Options: `PRINT`, `MODEL`, `FACTORIAL`, `COMBINATIONS`, `ADJUSTMENT`, `WEIGHTS`, `PREDICTIONS`, `SE`, `SED`, `ESE`, `LSD`, `LSDLEVEL`, `VCOVARIANCE`, `SAVE`.

Parameters: `CLASSIFY`, `LEVELS`.

### Method

The predictions are produced using the `PREDICT` directive.

### Reference

Lane, P.W. & Nelder, J.A. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics, 38, 613-621.

Directive: `PREDICT`.

Procedures: `AUNBALANCED`, `AUDISPLAY`, `AUGRAPH`, `AUMCOMPARISON`, `AUKEEP`.

Commands for: Analysis of variance.

### Example

```CAPTION 'AUPREDICT example',\
'Data from Genstat 5 Release 1 Reference Manual, page 340.';\
STYLE=meta,plain
FACTOR  [NVALUES=36; LEVELS=3; VALUES=12(1...3)] Block
FACTOR  [NVALUES=36; LABELS=!t(baresoil,emerald,emergo)] Leachate
&       [LABELS=!t('1','1/4','1/16','1/64')] Dilution
VARIATE [NVALUES=36] Nhatch,Nnohatch
1           2         109         318
3           4          54         350
3           1           *         415
2           2         783         212
3           3         652        1375
2           4         490         816
1           3          95        1219
2           1        1012          66
1           4         166         943
3           2        1059         313
1           1         257        1006
2           3        1058         234
2           4         507        1119
1           2         194         840
1           3         175        1707
1           1         326         609
3           4         142         980
2           3         286         230
3           2         546         313
2           2           *         301
2           1        2471         112
3           3          76         489
1           4         208         503
3           1           *         325
1           1         322         913
1           2         255        2246
3           2        1774        1446
2           2         999         193
2           4         388        1836
3           4         221        1800
1           3         220        1902
2           1        2821         187
3           1        1486         463
3           3         717        1473
1           4         143         941
2           3         968         550 :
CALCULATE          Logit%h = LOG(Nhatch/Nnohatch)
BLOCKSTRUCTURE     Block
TREATMENTSTRUCTURE Leachate*Dilution
AUNBALANCED        [PRINT=aovtable] Logit%h
AUPREDICT          Leachate
&                  Dilution
&                  Leachate,Dilution
```
Updated on June 20, 2019