ANOVA directive

Watch our videos on performing ANOVA using the Genstat GUI.

Analyses y-variates by analysis of variance according to the model defined by earlier BLOCKSTRUCTURE, COVARIATE and TREATMENTSTRUCTURE statements.

Options

`PRINT` = string tokens	Output from the analyses of the y-variates, adjusted for any covariates (`aovtable`, `information`, `covariates`, `effects`, `residuals`, `contrasts`, `means`, `cbeffects`, `cbmeans`, `stratumvariances`, `%cv`, `missingvalues`); default `aovt`, `info`, `cova`, `mean`, `miss`
`UPRINT` = string tokens	Output from the unadjusted analyses of the y-variates (`aovtable`, `information`, `effects`, `residuals`, `contrasts`, `means`, `cbeffects`, `cbmeans`, `stratumvariances`, `%cv`, `missingvalues`); default `*` i.e. no printing
`CPRINT` = string tokens	Output from the analyses of the covariates, if any (`aovtable`, `information`, `effects`, `residuals`, `contrasts`, `means`, `%cv`, `missingvalues`); default `*` i.e. no printing
`FACTORIAL` = scalar	Limit on number of factors in a treatment term; default 3
`CONTRASTS` = scalar	Limit on the order of a contrast of a treatment term; default 4
`DEVIATIONS` = scalar	Limit on the number of factors in a treatment term for the deviations from its fitted contrasts to be retained in the model; default 9
`PFACTORIAL` = scalar	Limit on number of factors in printed tables of means or effects; default 9
`PCONTRASTS` = scalar	Limit on order of printed contrasts; default 9
`PDEVIATIONS` = scalar	Limit on number of factors in a treatment term whose deviations from the fitted contrasts are to be printed; default 9
`FPROBABILITY` = string token	Printing of probabilities for variance ratios (`yes`, `no`); default `no`
`PSE` = string token	Standard errors to be printed with tables of means, `PSE=*` requests s.e.’s to be omitted (`differences`, `lsd`, `means`); default `diff`
`TWOLEVEL` = string token	Representation of effects in 2ⁿ experiments (`responses`, `Yates`, `effects`); default `resp`
`DESIGN` = pointer	Stores details of the design for use in subsequent analyses; default `*`
`WEIGHTS` = variate	Weights for each unit; default `*` i.e. all units with weight one
`ORTHOGONAL` = string token	Whether or not design to be assumed orthogonal (`notassumed`, `assumed`, `compulsory`); default `nota`
`SEED` = scalar	Seed for random numbers to generate dummy variate for determining the design; default 12345
`MAXCYCLE` = scalar	Maximum number of iterations for estimating missing values; default 20
`TOLERANCES` = variate	Allows you to redefine the tolerances for zero used by various parts of the algorithm
`NOMESSAGE` = string tokens	Which warning messages to suppress (`nonorthogonal`, `residual`); default `*`
`LSDLEVEL` = scalar	Significance level (%) to use in the calculation of least significant differences; default 5
`EXIT` = scalar	Saves an exit code indicating the properties of the design

Parameters

`Y` = variates	Variates to be analysed
`RESIDUALS` = variates	Variate to save residuals for each y variate
`FITTEDVALUES` = variates	Variate to save fitted values
`SAVE` = identifiers	Save details of each analysis for use in subsequent `ADISPLAY` or `AKEEP` statements

Description

The ANOVA directive analyses balanced designs. These include most of the commonly occurring experimental designs such as randomized blocks, Latin squares, split plots and other orthogonal designs, as well as designs with balanced confounding, like balanced lattices and balanced incomplete blocks. Many partially balanced designs can also be handled, so a very wide range of designs can be analysed. The necessary condition of first-order balance is explained algorithmically by Wilkinson (1970) and Payne & Wilkinson (1976), and mathematically by James & Wilkinson (1971) and Payne & Tobias (1992). However, ANOVA can itself detect whether or not a design can be analysed, so if you are not sure whether or not a particular design is analysable, you can run it through ANOVA and see what happens! (If it is unbalanced, you can use the AUNBALANCED procedure for designs with a single error term, or the REML directive for those with several.)

Before you use ANOVA you must first define the model that is to be fitted in the analysis. Potentially this has three parts. The TREATMENTSTRUCTURE directive specifies the treatment (or systematic, or fixed) terms for the analysis. The BLOCKSTRUCTURE directive defines the “underlying structure” of the design or, equivalently, the error terms for the analysis; in the simple cases where there is only a single error term this can be omitted. The other directive, COVARIATE, lists the covariates if an analysis of covariance is required. At the start of a job all these model-definition directives have null settings. However, once any one of them has been used, the defined setting remains in force for all subsequent analyses in the same job until it is redefined.

The first parameter of ANOVA, Y, lists the variates whose values are to be analysed. Genstat examines them all and forms a list of units for which any of the y-variates or any covariate has a missing value. These units are treated as missing in all the analyses. (This is necessary to avoid having to re-analyse covariates for each y-variate.) However, if your y-variates have different missing units, you may prefer to analyse them with separate ANOVA statements, while saving details of the model and design with the DESIGN option to improve efficiency. Genstat also checks whether any of the y-variates has a restriction. If several variates are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.

If a y-variate has no values, or if you specify a null entry in the Y list, Genstat produces a skeleton analysis-of-variance table, which excludes sums of squares, mean squares and variance ratios; the only other output available is the information summary. You can save a design structure, but no save structure is formed. This is a good way of checking that a design can be analysed, before the experiment is carried out.

The RESIDUALS parameter allows you to specify a variate to save the estimated residuals from each analysis. Genstat will declare this variate for you if you have not done so already. In models where there are several error terms, only the final one is included. Others can be obtained using the AKEEP directive. The fitted values from the analysis are defined to be the data values minus the estimated residuals. These too can be saved, using the FITTEDVALUES parameter. In models where there are several error terms, only the final error term is subtracted. If this is not what you want, you can save the other error terms using AKEEP and subtract them by CALCULATE.

The last parameter, SAVE, allows you to save details of the analysis in an ANOVA save structure. The ADISPLAY directive can use a save structure to produce further output. You can also use it in the AKEEP directive to put quantities calculated from the analysis into data structures which you can then use elsewhere in Genstat. Note that the save structure does not store the y-variate nor the block and treatment factors. ADISPLAY will be unaffected by any changes to their values. However, if fitted values or tables of bottom-stratum residuals are to be saved by AKEEP. these should be saved before any changes are made to the y-variate or the block factors. Save structures are special compound structures, and Genstat declares them automatically. The save structure for the last y-variate analysed is stored automatically, and forms the default for ADISPLAY and AKEEP if you do not provide one explicitly.

The PRINT option selects which components of output are to be displayed.

`aovtable`	analysis-of-variance table
`information`	information summary, giving details of aliasing and non-orthogonality or of any large residuals
`covariates`	estimates of covariate regression coefficients
`effects`	tables of estimated treatment parameters
`residuals`	tables of estimated residuals
`contrasts`	estimated contrasts of treatment effects
`means`	tables of predicted means for treatment terms
`cbeffects`	estimated effects of treatment terms combining information from all the strata in which each term is estimated
`cbmeans`	predicted means for treatment terms combining information from all the strata in which each term is estimated
`stratumvariances`	estimated variances of the units in each stratum and stratum variance components
`%cv`	coefficients of variation and standard errors of individual units
`missingvalues`	estimates of missing values

The default is intended to give the output that you will require most often from a full analysis: aovtable, information, covariates, means and missingvalues. However, with ANOVA the settings information, covariates and missingvalues will not produce any output unless there is something definite to report.

In analysis of covariance, you can also print output from the analyses of the covariates and from the analysis of the y-variate ignoring the covariates. This is controlled by options CPRINT and UPRINT respectively. These are similar to the PRINT option except that they do have not have the setting covariates, and their defaults are to print nothing.

A table of means is produced by default for each term in the treatment model. By using the PFACTORIAL option you can exclude tables for terms containing more than a specified number of factors; Genstat does not allow tables to have more than nine factors, so the default value of nine gives all the available tables. PFACTORIAL also applies to tables of effects. These are estimates of treatment parameters in the linear model.

The PSE option controls the standard errors printed with the tables of means. The default setting is differences, which gives standard errors of differences of means. The setting means produces standard errors of means, LSD produces least significant differences and by setting PSE=* the standard errors can be suppressed altogether. The significance level to use in the calculation of the least significant differences can be changed from the default of 5% using the LSDLEVEL option.

When a factor has only two levels, Genstat usually prints the difference between the two main effects instead of the effects themselves. This difference is called a response. For interaction terms whose factors all have only two levels, there are two forms of response. The choice between them is controlled by the TWOLEVEL option. If you leave the default, TWOLEVEL=response, Genstat calculates the response for an interaction between two factors as the difference between the two main-effect responses, and so on; this is the form described in most textbooks. By putting TWOLEVEL=Yates, you can obtain the form defined by Yates (1937) in which the responses all have equal standard errors. Alternatively, you can put TWOLEVEL=effects if you prefer not to have responses, but to have the effects themselves, as for factors with more than two levels.

The warnings about any large residuals printed in the information summary can be suppressed by setting the NOMESSAGES option to residuals. The other setting, nonorthogonality, of NOMESSAGES suppresses the warning produced when there is non-orthogonality between treatment terms or covariates.

The treatment terms to be included in the model are controlled by the FACTORIAL option; this sets a limit (by default 3) on the number of factors in a treatment term: terms containing more than that number are deleted.

The CONTRASTS option places a limit on the order of contrast to be fitted. (Contrasts are defined by using the functions POL, REG, COMPARISON, POLND or REGND in the treatment formula.) For a term involving a single factor, the orders of the successive contrasts run from one upwards, with the deviations term (if any) numbered highest. In interactions between contrasts, the order is the sum of the orders of the component parts. The default value for CONTRASTS is 4. Option PCONTRASTS similarly sets a limit on the order of the contrasts that are printed; its default value is 9.

If your design has few or no degrees of freedom for the residual, you may wish to regard the deviations from some of the fitted contrasts as error components, and assign them to the residual of the stratum where they occur. You can do this by the DEVIATIONS option; its value sets a limit on the number of factors in the terms whose deviations are to be retained in the model. For example, by putting DEVIATIONS=1, the deviations from the contrasts fitted to all terms except main effects will be assigned to error. The PDEVIATIONS option similarly controls the printing of deviations: to put PDEVIATIONS=0, for example, would ensure that no deviations are printed. When deviations have been assigned to error, they will not be included in the calculation of tables of means, which will then be labelled “smoothed”. However the associated standard errors of the means are not adjusted for the smoothing.

The WEIGHT option allows you to specify a weight for each unit, to define a weighted analysis of variance. You might want to do this if, for example, different parts of the experiment have different variability; each weight would then be proportional to the reciprocal of the expected variance for the corresponding unit. However unless the weights are fairly systematic, for example to give proportional weighted replication, the design is unlikely to be balanced.

Before Genstat does any calculations with the y-variates, it does an initial investigation known as the dummy analysis to acquire all the information that it needs for the analysis. You can use the DESIGN option to store this information so that Genstat need not recalculate it for future ANOVA statements. The structure in the option is automatically declared as a pointer if you have not declared it already. It points to several other structures which store information about different aspects of the analysis. The only other details that are required for future analyses are the values of the factors in the block and treatment formulae (which must remain unchanged). If you have not previously declared the design structure, or if it has no values, then the current statement derives and stores the necessary information. If the pointer does already have values, then these are used to do the analysis. In that case, of course, values of the factors in the block and treatment formulae must not have been changed since the design structure was formed. The current settings of options FACTORIAL, CONTRASTS, DEVIATIONS and WEIGHT are then ignored, as is any change in the restrictions on the y-variates. The DESIGN option is particularly useful with designs where there are many model terms or where there is non-orthogonality, as the dummy analysis may then be time-consuming.

Genstat has a simplified version of the dummy analysis which you can use to save computing time if all the model terms are orthogonal and if, for every term, all the combinations of its factors were applied to the same number of units. A check is incorporated which will detect non-orthogonality except in particularly complicated designs where terms are aliased. If you set option ORTHOGONAL=assumed, Genstat does the simple version unless non-orthogonality is detected, whereupon it gives a warning message and then switches to the full version. (Before Release 14, this was requested by setting ORTHOGONAL=yes, but the aim now is that options with settings yes and no do not have any other settings; however, yes is retained as a synonym for assumed, so that existing programs will still run.) The simplified version is done also if ORTHOGONAL=compulsory, but non-orthogonality now causes the analysis to stop altogether, with an error message; this is useful for checking for typing errors in the factor values when you know that the design should otherwise be orthogonal. The dummy analysis involves the analysis of a specially generated variate which contains random numbers from a Cauchy distribution. The starting value for their generation is set by the SEED option.

The TOLERANCES option controls numerical aspects of analysis. Its setting is a variate with up to four values: the first is used to calculate the tolerance for the analysis of the y-variates (default 10^-7), the second is for the tolerance used in the dummy analysis (default 10^-9), the third is for the estimation of missing values (default 10^-5) and the fourth is for the estimation of stratum variances. The MAXCYCLE option sets a limit on the number of iterations for estimating missing values. The EXIT option can save an exit code summarizing the properties of the design:

0	design orthogonal;
1	design has general balance (blocks terms mutually orthogonal, treatment terms mutually orthogonal, some treatment terms non-orthogonal to the block terms);
2	blocks terms mutually orthogonal, treatment terms non-orthogonal;
3	block terms non-orthogonal, treatment terms orthogonal;
4	block terms non-orthogonal, treatment terms non-orthogonal;
*	design unbalanced (`ANOVA` failed to analyse it).

Options: PRINT, UPRINT, CPRINT, FACTORIAL, CONTRASTS, DEVIATIONS, PFACTORIAL, PCONTRASTS, PDEVIATIONS, FPROBABILITY, PSE, TWOLEVEL, DESIGN, WEIGHTS, ORTHOGONAL, SEED, MAXCYCLE, TOLERANCES, NOMESSAGE, LSDLEVEL, EXIT.

Parameters: Y, RESIDUALS, FITTEDVALUES, SAVE.

Action with `RESTRICT`

You can restrict the set of units used for the analysis by applying a restriction to any of the y-variates. If several are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.

References

James, A.T. & Wilkinson, G.N. (1971). Factorisation of the residual operator and canonical decomposition of non-orthogonal factors in analysis of variance. Biometrika, 58, 279-294.

Payne, R.W. & Wilkinson, G.N. (1977). A general algorithm for analysis of variance. Applied Statistics, 26, 251-260.

Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics, 19, 3-23.

Wilkinson, G.N. (1970). A general recursive algorithm for analysis of variance. Biometrika, 57, 19-46.

Yates, F. (1937). The Design and Analysis of Factorial Experiments. Technical Communication No. 35 of the Commonwealth Bureau of Soils. Commonwealth Agricultural Bureaux, Farnham Royal.

Example

" Example ANOV-1: one-way analysis of variance"

" This experiment studied the effect of different additives on the octane
  level of gasoline. There were 5 additives and 4 observations on each
  (P.W.M. John, Statistical Design and Analysis of Experiments, page 46).

  Define number of units in the experiment (5 types x 4 observations)."
UNITS [NVALUES=20]
" Define a factor to indicate the type of gasoline for each observation."
FACTOR [LABELS=!T(A,B,C,D,E)] Gasoline
" Define variate to store the octane level recorded at each observation."
VARIATE Octane
" Read the data, representing Gasoline by its labels."
READ Gasoline,Octane; FREPRESENTATION=labels
 A  91.7   A  91.2   A  90.9   A  90.6
 B  91.7   B  91.9   B  90.9   B  90.9
 C  92.4   C  91.2   C  91.6   C  91.0
 D  91.8   D  92.2   D  92.0   D  91.4
 E  93.1   E  92.9   E  92.4   E  92.4 :
" Define the treatments to be fitted in the analysis."
TREATMENTS Gasoline
" Analyse the variate Octane, printing just the AOV table."
ANOVA [PRINT=aov] Octane
" Further output can be displayed without repeating the analysis:
  here we print tables of means."
ADISPLAY [PRINT=means]

" Example ANOV-1a: linear and quadratic contrasts"

" Suppose that gasolines A-E contain 0,1,2,3,4 cc/gallon of additive,
  respectively (but are otherwise identical). Plot the means against 
  the amount of additive."
AGRAPH Gasoline; NEWXLEVELS=!(0,1,2,3,4)

" Estimate the linear and quadratic effects of the additive."
TREATMENT POL(Gasoline; 2)
ANOVA [PRINT=aov,contrasts] Octane

Updated on March 27, 2024

Was this article helpful?

Yes No