Summarising variation between and within treatments
Evidence for a difference between the factor levels is strongest when the variation between observed treatment means is high relative to the variation within the treatments. In this page, we describe quantities that summarise these two types of variation.
We use the notation yij
to denote the j'th of the response measurements getting treatment i.
The mean response for the i'th treatment in the data is denoted by .
Total variation
Before summarising variation within and between treatments, we first present a value that describes the overall variability in the response measurement, ignoring the existence of the factor.
![]() |
The total sum of squares reflects the overall variability of the response. |
Note that the sample variance of all n response values is the total sum of squares divided by (n - 1).
Variation between treatment means (explained sum of squares)
A measure of variation between treatments describes how far apart the treatment means are. It is defined in terms of distances between the treatment means and the
overall mean, .
![]() |
The sum of squares between levels measures the variability of the level means. |
Note that the summation here is over all observations in the data set — if ni experimental units get treatment i, each of them separately contributes an equal amount to the between-treatments sum of squares.
Variation within treatments (residual sum of squares)
We summarise unexplained variation using differences between the values and their treatment means.
![]() |
The sum of squares within treatments quantifies the spread of values within each treatment. |
This is also called the residual sum of squares since it describes variability that is unexplained by differences between the treatments. Dividing it by (n - g) gives the pooled estimate of the unknown model parameter σ2.
Relationship between sums of squares
The following relationship is difficult to prove but is important.
Illustration of sums of squares
The display on the left below shows the results of a completely randomised experiment with 8 replicates at each of 4 factor levels.
The three jittered dot plots on the right show the values whose squares are summed to give the total, between-level and within-level sums of squares. Click on each of these three plots to display the quantities on the diagram on the left. The sums of squares summarise the size of the three components.
Use the slider to adjust the data values and observe how the relative size of the variation between and within treatments is reflected in the size of these sums of squares.
Evidence for a difference between the treatment means is strongest when the between-treatment sum of squares is much higher than the within-treatment sum of squares.
Select Two replicates from the pop-up menu and repeat.