Model for explained and unexplained variation

For most experimental data, the response, yi, for the i'th of the experimental units is modelled using a statistical distribution and we assume that all of these distributions are independent of each other.

Unexplained variation
The randomness of the statistical distribution used in the model corresponds to the unexplained variation. The distribution's spread summarises the amount of unexplained variation.
Explained variation
Variation caused by the controlled factors is modelled using the mean of the response distribution.

µi   =   (xi, zi, ..., )

This function describes the effect of the explanatory factors, xi, zi, ..., on the response. Variation caused by the known structure of the experimental units (e.g. blocks) is also modelled through the response mean.

Unknown parameters

The form of the function describing the explained variation depends on the characteristics of the response and factors, but it usually involves one or more unknown parameters. The following are examples of possible functions for the explained variation.

•  In an experiment with a single factor that has g levels, the response mean might be modelled as
  µi   =   βj      if the i'th experimental unit gets factor level j
  This model has one unknown parameter for each factor level, β1, β2, ..., βg.
 
•  In an experiment with a numerical factor, the response mean might be modelled as
  µi   =   exp( β0  +  β1 xi )    if the i'th experimental unit has factor value xi
  This model has two unknown parameters, β0 and β1.

Whatever the form of the function, the experimental data are used to estimate the unknown parameters. The parameter estimates reflect how strongly the controlled factors and known structure of the experimental units affect the respones.

Distribution describing unexplained variation

Unexplained variation in the data from the experiment is modelled with a standard statistical distribution. The distribution that should be used depends on the type of response measurement. The following examples indicate some possibilities.

Number of greenfly on rose bushes
Counts of events or items that have no upper limit are usually modelled with a Poisson distribution though negative binomial distributions are also occasionally used.
Number of bruised apples in boxes of 40
Counts of 'successes' in a fixed number of 'trials' is usually modelled with a binomial distribution.
Time until a cow with a particular disease dies
Survival times usually have very skew distributions and may be modelled with exponential, gamma or Weibull distributions.
Weight of wheat harvested per acre
Many continuous response measurements have fairly symmetric distributions or can be transformed into symmetric distributions with a logarithmic or other transformation. A normal distribution is often used as a model.

In practice, most experimental data are modelled with normal distributions, even when there is a better alternative. This is partly because normal distributions are a reasonable approximation for many types of data, but also because the analysis and interpretation of the models are much easier.

In this e-book, we will only deal with normal models