A little theory

Both explained and unexplained variation must be taken into account when trying to understand experimental data. The more unexplained variation, the more plausible the scenario that the factor has no effect on the response but the observed differences between the treatment means simply arose by chance.

To quantify explained and unexplained variation, we need to specify a model for the response that distinguishes them.

The model that we describe here is the basis of models that are critically important for analysing more complex experiments, so it is important that you fully understand this page.

Response measurements can be of many different types and several different types of distribution may be appropriate for modelling them. However most common responses can be modelled using normal distributions, so we concentrate here on normal models.

Model for a completely randomised experiment

One possible model for an experiment with a single controlled factor specifies that the reponses have independent normal distributions whose parameters depend on the factor level,

Level i :     Y  ∼  normal (µi , σi)

In practice, we usually simplify this model with the assumption that the response standard deviation is the same for all factor levels,

Level i :     Y  ∼  normal (µi , σ)

This model is flexible enough to be useful for many data sets and reduces the number of unknown parameters to (g + 1) — the g group means, µi, and the common standard deviation, σ.

Alternative notation for model

This model is often expressed in a different way with two suffices representing the level of the factor and the replicate. If factor level i has been used on ni experimental units (i.e. there are ni replicates at factor level i), we use the notation yij to denote the j'th of these ni response values. The above normal model can be written in the alternative form,

yij  =  µi   +   εij       for i = 1 to g and j = 1 to ni

The term, εij is called the error term in the model. It corresponds to the unexplained variation and is the only random part of the model,

εij  ∼  normal (0, σ)

This notation clearly distinguishes explained and unexplained variation and is particularly important when analysing more complex experiments.

Estimates of the treatment means

The g parameters, µ1, µ2, ..., µg, describe differences between the factor levels, so they are the most important parameters to estimate.

The best estimates are the g observed treatment means.


(It is relatively easy to show that these are least squares estimates but we will not give the proof here.)