Forms predictions from a linear or generalized linear model.
|What to print (
||Channel number for output; default
||Which combinations of factors in the current model to include (
||Type of adjustment (
||Weights classified by some or all of the factors in the model; default
||Value of offset on which to base predictions; default mean of offset variate|
||Method of forming margin (
||How to deal with aliased parameters (
||What back-transformation to apply to the values on the linear scale, before calculating the predicted means (
||Controls whether the variance of predictions is calculated on the basis of forecasting new observations rather than summarizing the data to which the model has been fitted (
||Which warning messages to suppress (
||Value of dispersion parameter in calculation of s.e.s; default is as set in the
||Basis of estimate of dispersion, if not fixed by
||Supplies the total number of trials to be used for prediction with a binomial distribution (providing a value n greater than one allows predictions to be made of the number of “successes” out of n, whereas the value one predicts the proportion of successes); default 1|
||Saves predictions for each y variate; default
||Saves standard errors of predictions for each y variate; default
||Saves standard errors of differences between predictions for each y variate; default
||Saves least significant differences between predictions for each y variate (models with Normal errors only); default
||Significance level (%) to use in the calculation of least significant differences; default 5|
||Saves variance-covariance matrices of predictions for each y variate; default
||Specifies save structure of model from which to predict; default
||Variates and/or factors to classify table of predictions|
||To specify values of variates, levels of factors|
||For each vector in the
||Identifiers for new factors that are defined when
PREDICT directive can be used after the
FIT directive to summarize the results of the regression, by using the fitted relationship to predict the values of the response variate at particular values of the explanatory variables.
CLASSIFY, the first parameter of
PREDICT, specifies those variates or factors in the current regression model whose effects you want to summarize. Any variate or factor in the current model that you do not include will be standardized in some way, as described below.
LEVELS parameter specifies values at which the summaries are to be calculated, for each of the structures in the
CLASSIFY list. For factors, you can select some or all of the levels, while for variates you can specify any set of values. A single level or value is represented by a scalar; several levels or values must be combined into a variate (which may of course be unnamed). Alternatively, if the factor has labels, you can use these to select the levels for the summaries by setting
LEVELS to a text. A missing value in the
LEVELS parameter is taken by Genstat to stand for all the levels of a factor, or for the mean value of a variate.
PARALLEL parameter allows you to indicate that a factor or variate should change in parallel to another factor or variate. Both of these should have same number of values specified for it by the
LEVELS parameter of
PREDICT. The predictions are then formed for each corresponding set of values rather than for every combination of these values. For example, suppose we had fitted a quadratic model with explanatory variates
Xsquared. We could then put
PREDICT Xsquared,X; PARALLEL=X,*;\
PARALLEL parameter specifies that
Xsquared should change in parallel to
X, so that we obtain predictions only for matching values.
When you specify
PREDICT needs to define a new factor to classify that dimension of the table. By default this will be an unnamed factor, but you can use the
NEWFACTOR parameter to give it an identifier. The
EXTRA attribute of the factor is set to the name of the corresponding factor or variate in the
CLASSIFY list; this will then be used to label that dimension of the table of predictions.
You can best understand how Genstat forms predictions by regarding its calculations as consisting of two steps. The first step, referred to below as Step A, is to calculate the full table of predictions, classified by every factor in the current model. For any variate in the model, the predictions are formed at its mean, unless you have specified some other values using the
LEVELS parameter; if so, these are then taken as a further classification of the table of predictions. The second step, referred to as Step B, is to average the full table of predictions over the classifications that do not appear in the
CLASSIFY parameter: you can control the type of averaging using the
WEIGHTS options. By default, the predictions are made at the mean of any offset variate, but option
OFFSET can be used to specify another value at which the predictions should be made instead.
Printed output is controlled by settings of the
||describes the standardization policies used when forming the predictions,|
||prints the predictions,|
||produces predictions and standard errors,|
||prints standard errors for differences between the predictions,|
||prints least significant differences between the predictions (ordinary linear regression models or generalized linear models with the Normal distibution only), and|
||prints the variance and covariances of the predictions.|
By default descriptions, predictions and standard errors are printed. The standard errors (and sed’s) are relevant for the predictions when considered as means of those data that have been analysed, with the means formed according to the averaging policy defined by the options of
PREDICT. The word prediction is used because these are predictions of what the means would have been if the factor levels been replicated differently in the data; see Lane & Nelder (1982) for more details. The
LSDLEVEL option specifies the significance level (%) to use in the calculation of least significant differences (default 5%).
By default, the standard errors (and sed’s) are not augmented by any component corresponding to the estimated variability of a new observation. However, you can set option
SCOPE=new to request that the variance of predictions should be calculated on the basis of forecasting new observations rather than of summarizing the data to which the model has been fitted. This setting cannot be used if the predictions are to be standardized for the effects of any factors in the model; in other words, all factors in the current model must be listed in the
CLASSIFY parameter of the
PREDICT statement. In addition, it cannot be used when making predictions from generalized linear models with option
BACKTRANSFORMATION=none, nor with weighted regression. The effect of
SCOPE=new is to form variances for each predicted value by combining the variance of the estimated mean value of the prediction (as produced for
SCOPE=data) together with the estimated variance of a new observation with the same values of explanatory variates and factors:
“new” variance = “data” variance + (dispersion × variance function)
DMETHOD options allow you to change the method by which the variance of the distribution of the response values is obtained for calculating the standard errors. These options operate like the corresponding options of
MODEL (except that they apply only to the current statement). The default is to use the method as originally defined by the
NBINOMIAL parameter can be used to supply the total number of trials to be used for prediction with a binomial distribution when option
BACKTRANSFORMATION is set to
link. If you provide a value n greater than one, Genstat will predict the number of “successes” out of n. The default,
NBINOMIAL=1, causes Genstat to predict the proportion of successes.
You can send the output to another channel, or to a text structure, by setting the
COMBINATIONS option specifies which cells of the full table in Step A are to be filled for averaging in Step B. The default,
COMBINATIONS=estimable, uses all the cells other than those that involve parameters that cannot be estimated, for example because of aliasing. Alternatively, you can set
COMBINATIONS=present to exclude cells for factor combinations that do not occur in the data, or
COMBINATIONS=full to use all the cells. When
LEVELS parameter is overruled. Any subsets of factor levels in the
LEVELS parameter are ignored, and predictions are formed for all the factor levels that occur in the data or are estimable. Likewise, the full table cannot then be classified by any sets of values of variates; the
LEVELS parameter must then supply only single values for variates.
WEIGHTS options define how the averaging is done in Step B. Values in the full table produced in Step A are averaged with respect to all those factors that you have not included in the settings of the
CLASSIFY parameter. By default, the levels of any such factor are combined with what we call marginal weights: that is, by the number of occurrences of each of its levels in the whole dataset. The
WEIGHTS options allow you to change the weights. The setting
ADJUSTMENT=equal specifies that the levels are to be weighted equally. (This corresponds to the default weighting used by
WEIGHTS option is more powerful than the
ADJUSTMENT option, allowing you to specify an explicit table of weights. This table can be classified by any, or all, of the factors over whose levels the predictions are to be averaged; the levels of remaining factors will be weighted according to the
ADJUSTMENT option. Moreover, you can classify the weights by the factors in the
CLASSIFY parameter as well, to provide different weightings for different combinations of levels of these factors. If you supply explicit weights in the
WEIGHTS option, any setting of the
COMBINATIONS option is ignored. You will find explicit weights useful in particular when you have population estimates of the proportions of each level of a factor – proportions which may not be matched well in the available data.
If a model contains any aliased parameters, predicted values cannot be formed for some cells of the full table without assuming a value for the aliased parameters. With the default setting,
COMBINATIONS=estimable, no predictions are formed for these cells. When
COMBINATIONS=full, if the aliased parameters simply represent effects of variates that are correlated with other explanatory variables in the model, it may be sufficient just to ignore them. This can be done by setting the
ALIASING option to
ignore. The aliased parameters are then taken to be zero, and fitted values are calculated for all cells of the table from the remaining parameters in the model. Aliasing can also occur if there are some combinations of factors that do not occur in the data, and here it may be more sensible to set option
COMBINATIONS=present so that these cells are all excluded from the calculation of predictions. The final way to overcome aliasing is to supply explicit weights using the
Averaging is usually the appropriate way of combining predicted values over levels of a factor. But sometimes summation is needed, for example in the analysis of counts by log-linear models. You can achieve this by setting the
METHOD option to
total. The rules about weights and so on still apply. In a generalized linear model, averaging is done by default on the scale of the original response variable, not on the scale transformed by the link function. In other words, linear predictors are formed for all the combinations of factor levels and variate values specified by
PREDICT, and then transformed by the link function back to the natural scale. This back-transformation may be useful when you are reporting results, since the tables from
PREDICT can then be interpreted as natural averages of means predicted by the fitted model. You can set option
BACKTRANSFORM=none if you want the averaging to be done on the scale of the linear predictor;
PREDICT will then form averages and report predictions on the transformed scale.
PREDICT calculates the standard errors of predictions from iterative models by using first-order approximations that allow for the effect of the link function. Thus you should interpret them only as a rough guide to the variability of individual predictions.
VCOVARIANCE options let you save the results of
PREDICT as well as, or instead of, printing them.
NOMESSAGE option controls printing of messages. The
nonlinear setting suppresses messages about the approximate nature of standard errors of predictions in generalized linear models, and the
dispersion setting prevents reminders appearing about the basis of the standard errors.
Lane, P.W. & Nelder, J.A. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics, 38, 613-621.
Commands for: Regression analysis.
" Example PRED-1: Prediction from simple linear regression Attempt to find a linear relationship between the boiling point of water and barometric pressure, to allow prediction of pressure and thus of altitude." " Read and display the data." READ Boiltemp,Pressure 194.5 20.79 194.3 20.79 197.9 22.40 198.4 22.67 199.4 23.15 199.9 23.35 200.9 23.89 201.1 23.99 201.4 24.02 201.3 24.01 203.6 25.14 204.6 26.57 209.5 28.49 208.6 27.76 210.7 29.04 211.9 29.88 212.2 30.06 : DGRAPH Pressure; Boiltemp " Regress pressure on boiling point." MODEL Pressure FIT Boiltemp " Predict pressure when boiling point is 190." PREDICT Boiltemp; LEVEL=190 " Print a chart of predictions for a range of temperatures including standard errors of the predicted means and standard errors for future observations." VARIATE [VALUES=190,192...216] temp PREDICT [PRINT=*; PREDICT=predict; SE=sepred] Boiltemp; LEVEL=temp RKEEP DEVIANCE=rss; DF=rdf CALCULATE sefuture = SQRT(sepred**2 + rss/rdf) PRINT predict,sepred,sefuture; DECIMALS=2