Explained and residual sums of squares

The relative sizes of the explained and residual components reflect the proportion of response variation that is explained by a linear model. Their sizes can be summarised by their sums of squares.

SStotal The total sum of squares reflects the total variability of the response — the response standard deviation is the square root of this divided by (n-1).
SSexplained The explained sum of squares measures the variability of the fitted values from the least squares fit. This is the variability that is explained by the model.
SSresidual The residual sum of squares quantifies the spread of values around the least squares line. This is a measure of the unexplained variability in the response.

The explained and residual sums of squares add to give the total sum of squares,

SStotal = SSexplained + SSresidual

(This relationship requires a fair bit of algebra to prove!)

The relative sizes of the explained and residual sums of squares describe how much of the variability is explained by the model.


Simulation: Impurities in plastic

The next diagram shows simulated data that might describe the impurities recorded from batches of plastic produced at different temperatures (degrees Fahrenheit).

Click on the jittered dot plots on the right to display the different components as coloured vertical lines on the scatterplot.

Drag the slider to change the strength of the relationship between the impurities and temperature. Observe that:

When the relationship is strong,
...the residuals are much smaller than the explained components and their sum of squares is also a small part of the total sum of squares.
When the relationship is weak,
...the residuals are larger than the explained components and their sum of squares is a large part of the total sum of squares.

The relative sizes of the sums of squares therefore hold information about the strength and significance of the relationship.