Components

The difference between a single sample value and the population mean can be written as the sum of two components,

Sums of squared components

Although it is not obvious without some algebraic manipulation, a similar result holds for the corresponding sums of squares,

The algebra to prove this result is not particularly difficult, but we leave it as an exercise for the mathematically inclined reader. Since the red component is the same for each of the n values in the data set, the equation can also be written as:

Note that, since the red component cannot be negative,


Displaying the components graphically

The left of the diagram below shows a jittered dot plot of a sample of n = 12 values from a normal distribution with mean µ = 10.


Sum of squares round population mean
The green vertical lines show the differences between the data values and µ. Their sum of squares is the total sum of squares about µ.
Sum of squares round sample mean
Select Round sample mean from the pop-up menu on the right. The blue vertical lines are the differences between the data values and . Their sum of squares is the sum of squares round the mean.
Sum of squares of mean
Finally select Sample mean from the pop-up menu. The red vertical lines are the differences between the sample and population mean. Summing this over all data values gives n times the squared difference — the sum of squares associated with the mean.

Click on crosses to see how the three components are related for individual values.

Finally, click Take sample a few times and observe that the sum of squares about the sample mean is always less than the sum of squares about the population mean.