Analyses y-variates by analysis of variance according to the model defined by earlier `BLOCKSTRUCTURE`

, `COVARIATE`

and `TREATMENTSTRUCTURE`

statements.

### Options

`PRINT` = string tokens |
Output from the analyses of the y-variates, adjusted for any covariates (`aovtable` , `information` , `covariates` , `effects` , `residuals` , `contrasts` , `means` , `cbeffects` , `cbmeans` , `stratumvariances` , `%cv` , `missingvalues` ); default `aovt` , `info` , `cova` , `mean` , `miss` |
---|---|

`UPRINT` = string tokens |
Output from the unadjusted analyses of the y-variates (`aovtable` , `information` , `effects` , `residuals` , `contrasts` , `means` , `cbeffects` , `cbmeans` , `stratumvariances` , `%cv` , `missingvalues` ); default `*` i.e. no printing |

`CPRINT` = string tokens |
Output from the analyses of the covariates, if any (`aovtable` , `information` , `effects` , `residuals` , `contrasts` , `means` , `%cv` , `missingvalues` ); default `*` i.e. no printing |

`FACTORIAL` = scalar |
Limit on number of factors in a treatment term; default 3 |

`CONTRASTS` = scalar |
Limit on the order of a contrast of a treatment term; default 4 |

`DEVIATIONS` = scalar |
Limit on the number of factors in a treatment term for the deviations from its fitted contrasts to be retained in the model; default 9 |

`PFACTORIAL` = scalar |
Limit on number of factors in printed tables of means or effects; default 9 |

`PCONTRASTS` = scalar |
Limit on order of printed contrasts; default 9 |

`PDEVIATIONS` = scalar |
Limit on number of factors in a treatment term whose deviations from the fitted contrasts are to be printed; default 9 |

`FPROBABILITY` = string token |
Printing of probabilities for variance ratios (`yes` , `no` ); default `no` |

`PSE` = string token |
Standard errors to be printed with tables of means, `PSE=*` requests s.e.’s to be omitted (`differences` , `lsd` , `means` ); default `diff` |

`TWOLEVEL` = string token |
Representation of effects in 2^{n} experiments (`responses` , `Yates` , `effects` ); default `resp` |

`DESIGN` = pointer |
Stores details of the design for use in subsequent analyses; default `*` |

`WEIGHTS` = variate |
Weights for each unit; default `*` i.e. all units with weight one |

`ORTHOGONAL` = string token |
Whether or not design to be assumed orthogonal (`notassumed` , `assumed` , `compulsory` ); default `nota` |

`SEED` = scalar |
Seed for random numbers to generate dummy variate for determining the design; default 12345 |

`MAXCYCLE` = scalar |
Maximum number of iterations for estimating missing values; default 20 |

`TOLERANCES` = variate |
Allows you to redefine the tolerances for zero used by various parts of the algorithm |

`NOMESSAGE` = string tokens |
Which warning messages to suppress (`nonorthogonal` , `residual` ); default `*` |

`LSDLEVEL` = scalar |
Significance level (%) to use in the calculation of least significant differences; default 5 |

`EXIT` = scalar |
Saves an exit code indicating the properties of the design |

### Parameters

`Y` = variates |
Variates to be analysed |
---|---|

`RESIDUALS` = variates |
Variate to save residuals for each y variate |

`FITTEDVALUES` = variates |
Variate to save fitted values |

`SAVE` = identifiers |
Save details of each analysis for use in subsequent `ADISPLAY` or `AKEEP` statements |

### Description

The `ANOVA`

directive analyses balanced designs. These include most of the commonly occurring experimental designs such as randomized blocks, Latin squares, split plots and other orthogonal designs, as well as designs with balanced confounding, like balanced lattices and balanced incomplete blocks. Many partially balanced designs can also be handled, so a very wide range of designs can be analysed. The necessary condition of *first-order balance* is explained algorithmically by Wilkinson (1970) and Payne & Wilkinson (1976), and mathematically by James & Wilkinson (1971) and Payne & Tobias (1992). However, `ANOVA`

can itself detect whether or not a design can be analysed, so if you are not sure whether or not a particular design is analysable, you can run it through `ANOVA`

and see what happens! (If it is unbalanced, you can use the `AUNBALANCED`

procedure for designs with a single error term, or the `REML`

directive for those with several.)

Before you use `ANOVA`

you must first define the model that is to be fitted in the analysis. Potentially this has three parts. The `TREATMENTSTRUCTURE`

directive specifies the treatment (or *systematic*, or *fixed*) terms for the analysis. The `BLOCKSTRUCTURE`

directive defines the “underlying structure” of the design or, equivalently, the *error* terms for the analysis; in the simple cases where there is only a single error term this can be omitted. The other directive, `COVARIATE`

, lists the covariates if an analysis of covariance is required. At the start of a job all these model-definition directives have null settings. However, once any one of them has been used, the defined setting remains in force for all subsequent analyses in the same job until it is redefined.

The first parameter of `ANOVA`

, `Y`

, lists the variates whose values are to be analysed. Genstat examines them all and forms a list of units for which any of the y-variates or any covariate has a missing value. These units are treated as missing in all the analyses. (This is necessary to avoid having to re-analyse covariates for each y-variate.) However, if your y-variates have different missing units, you may prefer to analyse them with separate `ANOVA`

statements, while saving details of the model and design with the `DESIGN`

option to improve efficiency. Genstat also checks whether any of the y-variates has a restriction. If several variates are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.

If a y-variate has no values, or if you specify a null entry in the `Y`

list, Genstat produces a *skeleton* analysis-of-variance table, which excludes sums of squares, mean squares and variance ratios; the only other output available is the information summary. You can save a design structure, but no save structure is formed. This is a good way of checking that a design can be analysed, before the experiment is carried out.

The `RESIDUALS`

parameter allows you to specify a variate to save the estimated residuals from each analysis. Genstat will declare this variate for you if you have not done so already. In models where there are several error terms, only the final one is included. Others can be obtained using the `AKEEP`

directive. The fitted values from the analysis are defined to be the data values minus the estimated residuals. These too can be saved, using the `FITTEDVALUES`

parameter. In models where there are several error terms, only the final error term is subtracted. If this is not what you want, you can save the other error terms using `AKEEP`

and subtract them by `CALCULATE`

.

The last parameter, `SAVE`

, allows you to save the complete details of the analysis in an * ANOVA save structure*. The

`ADISPLAY`

directive allows you to use a save structure to produce further output. You can also use it in the `AKEEP`

directive to put quantities calculated from the analysis into data structures which you can then use elsewhere in Genstat. Save structures are special compound structures, and Genstat declares them automatically. The save structure for the last y-variate analysed is stored automatically, and forms the default for `ADISPLAY`

and `AKEEP`

if you do not provide one explicitly.The `PRINT`

option selects which components of output are to be displayed.

The default is intended to give the output that you will require most often from a full analysis: `aovtable`

, `information`

, `covariates`

, `means`

and `missingvalues`

. However, with `ANOVA`

the settings `information`

, `covariates`

and `missingvalues`

will not produce any output unless there is something definite to report.

In analysis of covariance, you can also print output from the analyses of the covariates and from the analysis of the y-variate ignoring the covariates. This is controlled by options `CPRINT`

and `UPRINT`

respectively. These are similar to the `PRINT`

option except that they do have not have the setting `covariates`

, and their defaults are to print nothing.

A table of means is produced by default for each term in the treatment model. By using the `PFACTORIAL`

option you can exclude tables for terms containing more than a specified number of factors; Genstat does not allow tables to have more than nine factors, so the default value of nine gives all the available tables. `PFACTORIAL`

also applies to tables of effects. These are estimates of treatment parameters in the linear model.

The `PSE`

option controls the standard errors printed with the tables of means. The default setting is `differences`

, which gives standard errors of differences of means. The setting `means`

produces standard errors of means, `LSD`

produces least significant differences and by setting `PSE=*`

the standard errors can be suppressed altogether. The significance level to use in the calculation of the least significant differences can be changed from the default of 5% using the `LSDLEVEL`

option.

When a factor has only two levels, Genstat usually prints the difference between the two main effects instead of the effects themselves. This difference is called a *response*. For interaction terms whose factors all have only two levels, there are two forms of response. The choice between them is controlled by the `TWOLEVEL`

option. If you leave the default, `TWOLEVEL=response`

, Genstat calculates the response for an interaction between two factors as the difference between the two main-effect responses, and so on; this is the form described in most textbooks. By putting `TWOLEVEL=Yates`

, you can obtain the form defined by Yates (1937) in which the responses all have equal standard errors. Alternatively, you can put `TWOLEVEL=effects`

if you prefer not to have responses, but to have the effects themselves, as for factors with more than two levels.

The warnings about any large residuals printed in the information summary can be suppressed by setting the `NOMESSAGES`

option to `residuals`

. The other setting, `nonorthogonality`

, of `NOMESSAGES`

suppresses the warning produced when there is non-orthogonality between treatment terms or covariates.

The treatment terms to be included in the model are controlled by the `FACTORIAL`

option; this sets a limit (by default 3) on the number of factors in a treatment term: terms containing more than that number are deleted.

The `CONTRASTS`

option places a limit on the order of contrast to be fitted. (Contrasts are defined by using the functions `POL`

, `REG`

, `COMPARISON`

, `POLND`

or `REGND`

in the treatment formula.) For a term involving a single factor, the orders of the successive contrasts run from one upwards, with the deviations term (if any) numbered highest. In interactions between contrasts, the order is the sum of the orders of the component parts. The default value for `CONTRASTS`

is 4. Option `PCONTRASTS`

similarly sets a limit on the order of the contrasts that are printed; its default value is 9.

If your design has few or no degrees of freedom for the residual, you may wish to regard the deviations from some of the fitted contrasts as error components, and assign them to the residual of the stratum where they occur. You can do this by the `DEVIATIONS`

option; its value sets a limit on the number of factors in the terms whose deviations are to be retained in the model. For example, by putting `DEVIATIONS=1`

, the deviations from the contrasts fitted to all terms except main effects will be assigned to error. The `PDEVIATIONS`

option similarly controls the printing of deviations: to put `PDEVIATIONS=0`

, for example, would ensure that no deviations are printed. When deviations have been assigned to error, they will not be included in the calculation of tables of means, which will then be labelled “smoothed”. However the associated standard errors of the means are not adjusted for the smoothing.

The `WEIGHT`

option allows you to specify a weight for each unit, to define a weighted analysis of variance. You might want to do this if, for example, different parts of the experiment have different variability; each weight would then be proportional to the reciprocal of the expected variance for the corresponding unit. However unless the weights are fairly systematic, for example to give proportional weighted replication, the design is unlikely to be balanced.

Before Genstat does any calculations with the y-variates, it does an initial investigation known as the *dummy analysis* to acquire all the information that it needs for the analysis. You can use the `DESIGN`

option to store this information so that Genstat need not recalculate it for future `ANOVA`

statements. The structure in the option is automatically declared as a pointer if you have not declared it already. It points to several other structures which store information about different aspects of the analysis. The only other details that are required for future analyses are the values of the factors in the block and treatment formulae. If you have not previously declared the design structure, or if it has no values, then the current statement derives and stores the necessary information. If the pointer does already have values, then these are used to do the analysis. In that case, of course, values of the factors in the block and treatment formulae must not have been changed since the design structure was formed. The current settings of options `FACTORIAL`

, `CONTRASTS`

, `DEVIATIONS`

and `WEIGHT`

are then ignored, as is any change in the restrictions on the y-variates. The `DESIGN`

option is particularly useful with designs where there are many model terms or where there is non-orthogonality, as the dummy analysis may then be time-consuming.

Genstat has a simplified version of the dummy analysis which you can use to save computing time if all the model terms are orthogonal and if, for every term, all the combinations of its factors were applied to the same number of units. A check is incorporated which will detect non-orthogonality except in particularly complicated designs where terms are aliased. If you set option `ORTHOGONAL=assumed`

, Genstat does the simple version unless non-orthogonality is detected, whereupon it gives a warning message and then switches to the full version. (Before Release 14, this was requested by setting `ORTHOGONAL=yes`

, but the aim now is that options with settings `yes`

and `no`

do not have any other settings; however, `yes`

is retained as a synonym for `assumed`

, so that existing programs will still run.) The simplified version is done also if `ORTHOGONAL=compulsory`

, but non-orthogonality now causes the analysis to stop altogether, with an error message; this is useful for checking for typing errors in the factor values when you know that the design should otherwise be orthogonal. The dummy analysis involves the analysis of a specially generated variate which contains random numbers from a Cauchy distribution. The starting value for their generation is set by the `SEED`

option.

The `TOLERANCES`

option controls numerical aspects of analysis. Its setting is a variate with up to four values: the first is used to calculate the tolerance for the analysis of the y-variates (default 10^{-7}), the second is for the tolerance used in the dummy analysis (default 10^{-9}), the third is for the estimation of missing values (default 10^{-5}) and the fourth is for the estimation of stratum variances. The `MAXCYCLE`

option sets a limit on the number of iterations for estimating missing values. The `EXIT`

option can save an exit code summarizing the properties of the design:

0 | design orthogonal; |
---|---|

1 | design has general balance (blocks terms mutually orthogonal, treatment terms mutually orthogonal, some treatment terms non-orthogonal to the block terms); |

2 | blocks terms mutually orthogonal, treatment terms non-orthogonal; |

3 | block terms non-orthogonal, treatment terms orthogonal; |

4 | block terms non-orthogonal, treatment terms non-orthogonal; |

* | design unbalanced (`ANOVA` failed to analyse it). |

Options: `PRINT`

, `UPRINT`

, `CPRINT`

, `FACTORIAL`

, `CONTRASTS`

, `DEVIATIONS`

, `PFACTORIAL`

, `PCONTRASTS`

, `PDEVIATIONS`

, `FPROBABILITY`

, `PSE`

, `TWOLEVEL`

, `DESIGN`

, `WEIGHTS`

, `ORTHOGONAL`

, `SEED`

, `MAXCYCLE`

, `TOLERANCES`

, `NOMESSAGE`

, `LSDLEVEL`

, `EXIT`

.

Parameters: `Y`

, `RESIDUALS`

, `FITTEDVALUES`

, `SAVE`

.

### Action with `RESTRICT`

You can restrict the set of units used for the analysis by applying a restriction to any of the y-variates. If several are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.

### References

James, A.T. & Wilkinson, G.N. (1971). Factorisation of the residual operator and canonical decomposition of non-orthogonal factors in analysis of variance. *Biometrika*, 58, 279-294.

Payne, R.W. & Wilkinson, G.N. (1977). A general algorithm for analysis of variance. *Applied Statistics*, 26, 251-260.

Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. *Scandinavian Journal of Statistics*, 19, 3-23.

Wilkinson, G.N. (1970). A general recursive algorithm for analysis of variance. *Biometrika*, 57, 19-46.

Yates, F. (1937). *The Design and Analysis of Factorial Experiments*. Technical Communication No. 35 of the Commonwealth Bureau of Soils. Commonwealth Agricultural Bureaux, Farnham Royal.

### See also

Directives: `BLOCKSTRUCTURE`

, `COVARIATE`

, `TREATMENTSTRUCTURE`

, `ADISPLAY`

, `AKEEP`

, `FIT`

, `REML`

.

Procedures: `ABOXCOX`

, `ACHECK`

, `AFCOVARIATES`

, `AFMEANS`

, `AGRAPH`

, `APLOT`

, `AFIELDRESIDUALS`

, `APERMTEST`

, `APOWER`

, `AMCOMPARISON`

, `AMDUNNETT`

, `AN1ADVICE`

, `APOLYNOMIAL`

, `ARESULTSUMMARY`

, `ASPREADSHEET`

, `ASTATUS`

, `AOVANYHOW`

, `A2RDA`

, `A2WAY`

, `AUNBALANCED`

, `AREPMEASURES`

, `ASCREEN`

, `AYPARALLEL`

, `FALIASTERMS`

.

Functions: `COMPARISON`

, `POL`

, `POLND`

, `REG`

, `REGND`

.

Commands for: Analysis of variance, Design of experiments, REML analysis of linear mixed models.

### Example

" Example ANOV-1: one-way analysis of variance" " This experiment studied the effect of different additives on the octane level of gasoline. There were 5 additives and 4 observations on each (P.W.M. John, Statistical Design and Analysis of Experiments, page 46). Define number of units in the experiment (5 types x 4 observations)." UNITS [NVALUES=20] " Define a factor to indicate the type of gasoline for each observation." FACTOR [LABELS=!T(A,B,C,D,E)] Gasoline " Define variate to store the octane level recorded at each observation." VARIATE Octane " Read the data, representing Gasoline by its labels." READ Gasoline,Octane; FREPRESENTATION=labels A 91.7 A 91.2 A 90.9 A 90.6 B 91.7 B 91.9 B 90.9 B 90.9 C 92.4 C 91.2 C 91.6 C 91.0 D 91.8 D 92.2 D 92.0 D 91.4 E 93.1 E 92.9 E 92.4 E 92.4 : " Define the treatments to be fitted in the analysis." TREATMENTS Gasoline " Analyse the variate Octane, printing just the AOV table." ANOVA [PRINT=aov] Octane " Further output can be displayed without repeating the analysis: here we print tables of means." ADISPLAY [PRINT=means] " Example ANOV-1a: linear and quadratic contrasts" " Suppose that gasolines A-E contain 0,1,2,3,4 cc/gallon of additive, respectively (but are otherwise identical). Plot the means against the amount of additive." AGRAPH Gasoline; NEWXLEVELS=!(0,1,2,3,4) " Estimate the linear and quadratic effects of the additive." TREATMENT POL(Gasoline; 2) ANOVA [PRINT=aov,contrasts] Octane