|Output from the analyses of the y-variates, adjusted for any covariates (
||Output from the unadjusted analyses of the y-variates (
||Output from the analyses of the covariates, if any (
||Limit on number of factors in a treatment term; default 3|
||Limit on the order of a contrast of a treatment term; default 4|
||Limit on the number of factors in a treatment term for the deviations from its fitted contrasts to be retained in the model; default 9|
||Limit on number of factors in printed tables of means or effects; default 9|
||Limit on order of printed contrasts; default 9|
||Limit on number of factors in a treatment term whose deviations from the fitted contrasts are to be printed; default 9|
||Printing of probabilities for variance ratios (
||Standard errors to be printed with tables of means,
||Representation of effects in 2n experiments (
||Stores details of the design for use in subsequent analyses; default
||Weights for each unit; default
||Whether or not design to be assumed orthogonal (
||Seed for random numbers to generate dummy variate for determining the design; default 12345|
||Maximum number of iterations for estimating missing values; default 20|
||Tolerances for zero in various contexts; default
||Which warning messages to suppress (
||Significance level (%) to use in the calculation of least significant differences; default 5|
||Saves an exit code indicating the properties of the design|
||Variates to be analysed|
||Variate to save residuals for each y variate|
||Variate to save fitted values|
||Save details of each analysis for use in subsequent
ANOVA directive analyses balanced designs. These include most of the commonly occurring experimental designs such as randomized blocks, Latin squares, split plots and other orthogonal designs, as well as designs with balanced confounding, like balanced lattices and balanced incomplete blocks. Many partially balanced designs can also be handled, so a very wide range of designs can be analysed. The necessary condition of first-order balance is explained algorithmically by Wilkinson (1970) and Payne & Wilkinson (1976), and mathematically by James & Wilkinson (1971) and Payne & Tobias (1992). However,
ANOVA can itself detect whether or not a design can be analysed, so if you are not sure whether or not a particular design is analysable, you can run it through
ANOVA and see what happens! (If it is unbalanced, you can use the
AUNBALANCED procedure for designs with a single error term, or the
REML directive for those with several.)
Before you use
ANOVA you must first define the model that is to be fitted in the analysis. Potentially this has three parts. The
TREATMENTSTRUCTURE directive specifies the treatment (or systematic, or fixed) terms for the analysis. The
BLOCKSTRUCTURE directive defines the “underlying structure” of the design or, equivalently, the error terms for the analysis; in the simple cases where there is only a single error term this can be omitted. The other directive,
COVARIATE, lists the covariates if an analysis of covariance is required. At the start of a job all these model-definition directives have null settings. However, once any one of them has been used, the defined setting remains in force for all subsequent analyses in the same job until it is redefined.
The first parameter of
Y, lists the variates whose values are to be analysed. Genstat examines them all and forms a list of units for which any of the y-variates or any covariate has a missing value. These units are treated as missing in all the analyses. (This is necessary to avoid having to re-analyse covariates for each y-variate.) However, if your y-variates have different missing units, you may prefer to analyse them with separate
ANOVA statements, while saving details of the model and design with the
DESIGN option to improve efficiency. Genstat also checks whether any of the y-variates has a restriction. If several variates are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.
If a y-variate has no values, or if you specify a null entry in the
Y list, Genstat produces a skeleton analysis-of-variance table, which excludes sums of squares, mean squares and variance ratios; the only other output available is the information summary. You can save a design structure, but no save structure is formed. This is a good way of checking that a design can be analysed, before the experiment is carried out.
RESIDUALS parameter lets you specify a variate to save the estimated residuals from each analysis. Genstat will declare this variate for you if you have not done so already. In models where there are several error terms, only the final one is included. Others can be obtained using the
AKEEP directive. The fitted values from the analysis are defined to be the data values minus the estimated residuals. These too can be saved, using the
FITTEDVALUES parameter. In models where there are several error terms, only the final error term is subtracted. If this is not what you want, you can save the other error terms using
AKEEP and subtract them by
The last parameter,
SAVE, lets you save the complete details of the analysis in an
ANOVA save structure. The
ADISPLAY directive lets you use a save structure to produce further output. You can also use it in the
AKEEP directive to put quantities calculated from the analysis into data structures which you can then use elsewhere in Genstat. Save structures are special compound structures, and Genstat declares them automatically. The save structure for the last y-variate analysed is stored automatically, and forms the default for
AKEEP if you do not provide one explicitly.
The default is intended to give the output that you will require most often from a full analysis:
missingvalues. However, with
ANOVA the settings
missingvalues will not produce any output unless there is something definite to report.
In analysis of covariance, you can also print output from the analyses of the covariates and from the analysis of the y-variate ignoring the covariates. This is controlled by options
UPRINT respectively. These are similar to the
covariates, and their defaults are to print nothing.
A table of means is produced by default for each term in the treatment model. By using the
PFACTORIAL option you can exclude tables for terms containing more than a specified number of factors; Genstat does not allow tables to have more than nine factors, so the default value of nine gives all the available tables.
PFACTORIAL also applies to tables of effects. These are estimates of treatment parameters in the linear model.
PSE option controls the standard errors printed with the tables of means. The default setting is
differences, which gives standard errors of differences of means. The setting
means produces standard errors of means,
LSD produces least significant differences and by setting
PSE=* the standard errors can be suppressed altogether. The significance level to use in the calculation of the least significant differences can be changed from the default of 5% using the
When a factor has only two levels, Genstat usually prints the difference between the two main effects instead of the effects themselves. This difference is called a response. For interaction terms whose factors all have only two levels, there are two forms of response. The choice between them is controlled by the
TWOLEVEL option. If you leave the default,
TWOLEVEL=response, Genstat calculates the response for an interaction between two factors as the difference between the two main-effect responses, and so on; this is the form described in most textbooks. By putting
TWOLEVEL=Yates, you can obtain the form defined by Yates (1937) in which the responses all have equal standard errors. Alternatively, you can put
TWOLEVEL=effects if you prefer not to have responses, but to have the effects themselves, as for factors with more than two levels.
The warnings about any large residuals printed in the information summary can be suppressed by setting the
NOMESSAGES option to
residuals. The other setting,
NOMESSAGES suppresses the warning produced when there is non-orthogonality between treatment terms or covariates.
The treatment terms to be included in the model are controlled by the
FACTORIAL option; this sets a limit (by default 3) on the number of factors in a treatment term: terms containing more than that number are deleted.
CONTRASTS option places a limit on the order of contrast to be fitted. (Contrasts are defined by using the functions
REGND in the treatment formula.) For a term involving a single factor, the orders of the successive contrasts run from one upwards, with the deviations term (if any) numbered highest. In interactions between contrasts, the order is the sum of the orders of the component parts. The default value for
CONTRASTS is 4. Option
PCONTRASTS similarly sets a limit on the order of the contrasts that are printed; its default value is 9.
If your design has few or no degrees of freedom for the residual, you may wish to regard the deviations from some of the fitted contrasts as error components, and assign them to the residual of the stratum where they occur. You can do this by the
DEVIATIONS option; its value sets a limit on the number of factors in the terms whose deviations are to be retained in the model. For example, by putting
DEVIATIONS=1, the deviations from the contrasts fitted to all terms except main effects will be assigned to error. The
PDEVIATIONS option similarly controls the printing of deviations: to put
PDEVIATIONS=0, for example, would ensure that no deviations are printed. When deviations have been assigned to error, they will not be included in the calculation of tables of means, which will then be labelled “smoothed”. However the associated standard errors of the means are not adjusted for the smoothing.
WEIGHT option lets you specify a weight for each unit, to define a weighted analysis of variance. You might want to do this if, for example, different parts of the experiment have different variability; each weight would then be proportional to the reciprocal of the expected variance for the corresponding unit. However unless the weights are fairly systematic, for example to give proportional weighted replication, the design is unlikely to be balanced.
Before Genstat does any calculations with the y-variates, it does an initial investigation known as the dummy analysis to acquire all the information that it needs for the analysis. You can use the
DESIGN option to store this information so that Genstat need not recalculate it for future
ANOVA statements. The structure in the option is automatically declared as a pointer if you have not declared it already. It points to several other structures which store information about different aspects of the analysis. The only other details that are required for future analyses are the values of the factors in the block and treatment formulae. If you have not previously declared the design structure, or if it has no values, then the current statement derives and stores the necessary information. If the pointer does already have values, then these are used to do the analysis. In that case, of course, values of the factors in the block and treatment formulae must not have been changed since the design structure was formed. The current settings of options
WEIGHT are then ignored, as is any change in the restrictions on the y-variates. The
DESIGN option is particularly useful with designs where there are many model terms or where there is non-orthogonality, as the dummy analysis may then be time-consuming.
Genstat has a simplified version of the dummy analysis which you can use to save computing time if all the model terms are orthogonal and if, for every term, all the combinations of its factors were applied to the same number of units. A check is incorporated which will detect non-orthogonality except in particularly complicated designs where terms are aliased. If you set option
ORTHOGONAL=assumed, Genstat does the simple version unless non-orthogonality is detected, whereupon it gives a warning message and then switches to the full version. (Before Release 14, this was requested by setting
ORTHOGONAL=yes, but the aim now is that options with settings
no do not have any other settings; however,
yes is retained as a synonym for
assumed, so that existing programs will still run.) The simplified version is done also if
ORTHOGONAL=compulsory, but non-orthogonality now causes the analysis to stop altogether, with an error message; this is useful for checking for typing errors in the factor values when you know that the design should otherwise be orthogonal. The dummy analysis involves the analysis of a specially generated variate which contains random numbers from a Cauchy distribution. The starting value for their generation is set by the
TOLERANCES option controls numerical aspects of analysis. Its setting is a variate with up to four values: the first is used to calculate the tolerance for the analysis of the y-variates (default 10-7), the second is for the tolerance used in the dummy analysis (default 10-9), the third is for the estimation of missing values (default 10-5) and the fourth is for the estimation of stratum variances. The
MAXCYCLE option sets a limit on the number of iterations for estimating missing values. The
EXIT option can save an exit code summarizing the properties of the design:
|1||design has general balance (blocks terms mutually orthogonal, treatment terms mutually orthogonal, some treatment terms non-orthogonal to the block terms);|
|2||blocks terms mutually orthogonal, treatment terms non-orthogonal;|
|3||block terms non-orthogonal, treatment terms orthogonal;|
|4||block terms non-orthogonal, treatment terms non-orthogonal;|
|*||design unbalanced (
You can restrict the set of units used for the analysis by applying a restriction to any of the y-variates. If several are restricted, they must all be restricted to the same set of units. Only these units are included in the analysis of each y-variate.
James, A.T. & Wilkinson, G.N. (1971). Factorisation of the residual operator and canonical decomposition of non-orthogonal factors in analysis of variance. Biometrika, 58, 279-294.
Payne, R.W. & Wilkinson, G.N. (1977). A general algorithm for analysis of variance. Applied Statistics, 26, 251-260.
Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics, 19, 3-23.
Wilkinson, G.N. (1970). A general recursive algorithm for analysis of variance. Biometrika, 57, 19-46.
Yates, F. (1937). The Design and Analysis of Factorial Experiments. Technical Communication No. 35 of the Commonwealth Bureau of Soils. Commonwealth Agricultural Bureaux, Farnham Royal.
" Example ANOV-1: one-way analysis of variance" " This experiment studied the effect of different additives on the octane level of gasoline. There were 5 additives and 4 observations on each (P.W.M. John, Statistical Design and Analysis of Experiments, page 46). Define number of units in the experiment (5 types x 4 observations)." UNITS [NVALUES=20] " Define a factor to indicate the type of gasoline for each observation." FACTOR [LABELS=!T(A,B,C,D,E)] Gasoline " Define variate to store the octane level recorded at each observation." VARIATE Octane " Read the data, representing Gasoline by its labels." READ Gasoline,Octane; FREPRESENTATION=labels A 91.7 A 91.2 A 90.9 A 90.6 B 91.7 B 91.9 B 90.9 B 90.9 C 92.4 C 91.2 C 91.6 C 91.0 D 91.8 D 92.2 D 92.0 D 91.4 E 93.1 E 92.9 E 92.4 E 92.4 : " Define the treatments to be fitted in the analysis." TREATMENTS Gasoline " Analyse the variate Octane, printing just the AOV table." ANOVA [PRINT=aov] Octane " Further output can be displayed without repeating the analysis: here we print tables of means." ADISPLAY [PRINT=means] " Example ANOV-1a: linear and quadratic contrasts" " Suppose that gasolines A-E contain 0,1,2,3,4 cc/gallon of additive, respectively (but are otherwise identical). Plot the means against the amount of additive." AGRAPH Gasoline; NEWXLEVELS=!(0,1,2,3,4) " Estimate the linear and quadratic effects of the additive." TREATMENT POL(Gasoline; 2) ANOVA [PRINT=aov,contrasts] Octane