Summarising explained and unexplained variation

Evidence for a factor affecting the response is strongest when the variation explained by the model is high relative to the unexplained variation. In this page, we describe quantities that summarise these two types of variation.

These summaries are based on three types of value.

Observed response
These are simply the raw response values, yij.
Fitted values
The fitted values are the best predictions of the response based on the factor levels. The formula depends on the model being used. For a categorical factor model, the fitted values are the treatment means; for linear or quadratic models, the fitted values are found from the least squares parameters for the model.

Overall variation (total sum of squares)

We summarise response variation using differences between the response values and their mean.

SSresidual The total sum of squares reflects the overall variability of the response.

Note that the sample variance of all n response values is the total sum of squares divided by (n - 1).

Unexplained variation (residual sum of squares)

We summarise unexplained variation using differences between the observed and fitted values — the model residuals.

SSresidual This sum of squares summarises how much the response values vary around our best prediction of the response for that factor level.

This is also called the residual sum of squares.

Explained sum of squares

The explained sum of squares summarises the variability of the model predictions — the fitted values. If the fitted values are similar for all factor levels, there is little explained variation, whereas if the fitted values differ between factor levels, this implies systematic differences between the factor levels. This sum of squares therefore summarises explained variation.

SSexplained The explained sum of squares measures the variability of the fitted values.

Note that the summation here is over all observations in the data set — if ni experimental units get treatment i, each of them separately contributes an equal amount to the explained sum of squares.

Relationship between sums of squares

The following relationship is difficult to prove but is important.

SStotal = SSexplained + SSresidual


Illustration: categorical factor

The display on the left below shows the results of a completely randomised experiment with 8 replicates at each of 4 factor levels.

The three jittered dot plots on the right show the values whose squares are summed to give the total, explained and unexplained (residual) sums of squares. Click on each of these three plots to display the quantities on the diagram on the left. The sums of squares summarise the size of the three components.

Use the slider to adjust the data values and observe how the relative size of the variation between and within treatments is reflected in the size of these sums of squares.

Evidence for a difference between the factor levels is strongest when the explained sum of squares is much higher than the residual sum of squares.

Select Two replicates from the pop-up menu and repeat.

Illustration: linear model

The diagram below is similar but describes variation in an experiment with a numerical factor. In order to illustrate the explained and residual sums of squares better, we have chosen an experiment with a single replicate at each of 9 equally-spaced values for the factor.

The three jittered dot plots on the right again show the values whose squares are summed to give the total, explained and residual sums of squares. Click on each of these three plots to display the quantities whose squares are being summed on the diagram on the left.

Adjust the slider and again observe how the evidence for a relationship is described by the relative size of the explained and residual sums of squares.

Evidence for a difference between the treatment means is strongest when the explained sum of squares is much higher than the residual sum of squares.