Within-group and overall standard deviation
In some data sets, the 'individuals' can be split into groups. For example,
For grouped data, we can:
When the groups are combined, you lose all information about the differences between the groups. Not only are differences between the group means lost, but the differences between the group means make the overall variability larger than variability within the groups.
The standard deviation of the combined data set is often considerably higher than that of the separate groups. |
It is therefore better to separately describe the distributions within the groups than to describe the overall distribution with a single mean and standard deviation.
Maximum temperatures in Bulawayo
The data set below shows the maximum temperatures in Bulawayo each month from July 1951 to April 2001. The jittered dot plot at the top gives the complete data set (598 monthly values).
The maximum temperatures in each January are shown at the bottom. Drag the slider to show the distributions of maximum temperature within the other months.
Observe that the standard deviation is much lower within the months than the overall standard deviation. |