Within-group and overall standard deviation
In some data sets, the 'individuals' can be split into groups. For example,
For grouped data, we can:
When the groups are combined, you lose all information about the differences between the groups. Not only are differences between the group means lost, but the differences between the group means make the overall variability larger than variability within the groups.
The standard deviation of the combined data set is often considerably higher than that of the separate groups.
It is therefore better to separately describe the distributions within the groups than to describe the overall distribution with a single mean and standard deviation.
Maximum temperatures in Boston
Weather is often summarised by the maximum temperature each day. The data set below shows the average of these maximum daily temperatures in Boston, USA, each month from January 1950 to April 2014. (Each value is the average of the daily maximum temperatures in the month.) The jittered dot plot at the top shows the complete data set (772 monthly values).
The maximum temperatures in each January are shown at the bottom. Drag the slider to show the distributions of maximum temperature within the other months.
Observe that the standard deviation is much lower within the months than the overall standard deviation.