Long page
descriptions

Chapter 1   Simple Experiments

1.1   Basic concepts and notation

1.1.1   Structure of an experiment

This page describes notation for experimental data -- experimental units, controlled variables and a response.

1.1.2   Before you start

Decisions must be made about the experimental units to use, the response to measure from each, the controlled variables to vary and the values of these variables to use. These are generally chosen for non-statistical reasons.

1.1.3   Randomisation

Experimental design defines which experimental treatments are applied to which units. If this is done badly, the experiment can result in incorrect conclusions. Randomisation prevents biased results.

1.1.4   Blocks of experimental units

If the experimental units are not identical, grouping them into blocks of similar units improves accuracy.

1.2   Design and estimates

1.2.1   Completely randomised experiment

In the simplest type of experiment, there are no known differences between the experimental units and a single factor is varied. In a completely design, the different levels of the factor are randomly allocated to the pool of experimental units.

1.2.2   Experiment with one factor

This page gives a few data sets from completely randomised experiments for a single factor.

1.2.3   Explained and unexplained variation

Varying the controlled factor causes variability in the response -- explained variation. Other response variability remains unexplained.

1.2.4   Treatment means

The mean responses at the different factor levels summarise differences between the treatments -- the explained variation.

1.3   Model for response

1.3.1   Normal model

The response is usually modelled as the sum of two terms, a term for explained variation that depends on the factor level and a random term with a normal distribution describing unexplained variation.

1.3.2   Categorical factors

If the relationship between the response and x is nonlinear, the mean response can be modelled with a quadratic function of x. An even more general model uses a separate parameter for the mean response at each x that is used; it is also appropriate for a categorical explanatory variable.

1.3.3   Numerical factors

The simplest model for an experiment with one numerical controlled variable, x, is a linear model in which the mean response is a linear function of x.

1.3.4   Least squares

All models involve unknown parameters. The least squares estimates of the parameters minimise the sum of squared residuals.

1.3.5   Numerical factors with 2 or 3 levels

If only 2 values of a numerical factor are used in an experiment, a linear model has identical fit to a model that treats the factor as categorical. If 3 values of the factor have been used, a quadratic model is equivalent to a model that treats the factor as categorical.

1.3.6   Coding for factors

Evenly spaced values of a numerical factor can be replaced by any other evenly spaced values, such as 1, 2, ... without changing the fit of the model. A numerical or categorical factor with 2 levels is often modelled as a numerical factor with values -1 and +1.

1.4   Inference

1.4.1   Variation between and within treatments

Assessing whether a categorical factor affects the response must take into account both variation between the treatment means and also variation within each factor level.

1.4.2   Explained and unexplained variation

For experiments with numerical factors, the ideas of between- and within-treatment variation must be generalised to explained and unexplained variation. Both types of variation affect our assessment of whether the factor affects the response.

1.4.3   Sums of squares

Explained and unexplained variation are summarised by quantities called explained and unexplained sums of squares.

1.4.4   Analysis of variance

The explained and unexplained sums of squares form the basis of an analysis of variance table that can be used to test whether the factor really does affect the response.

1.4.5   Hierarchy of models for numerical factor

A linear model is the simplest one for a numerical factor but a quadratic model and one that treats the factor as categorical categorical allow increasing degrees of curvature in the relationship. Models that allow curvature have smaller residual sums of squares.

1.4.6   Testing linearity

The explained sum of squares for changing from a quadratic to a categorical model is the basis of an anova test of goodness-of-fit of a quadratic model. The explained sum of squares for changing from a linear to a quadratic model can be used to test for curvature.

1.4.7   Estimating the error variance

The mean residual sum of squares estimates the variance of the 'errors' in the model. This is also the variance of replicate observations within any factor level.

1.4.8   Confidence intervals for treatment means

Confidence intervals for the treatment means provide a good summary of the effect of a factor.