Graphical displays highlight density
We have shown that sorting a group of marks into order can highlight which ranges of marks are most and least common — in other words, the ranges of values with highest and lowest density of values. Density is the key to understanding the distribution of numbers in a batch, but there are better ways to display density than a sorted list.
Dot plots
The simplest graphical display of a batch of numbers is a dot plot. This shows each value as a cross (or dot) against a numerical axis.
Maths marks example
In the previous page, we sorted a set of maths test marks into order. The sorted marks are listed on the left below. The marks are also displayed in a dot plot on the right.
Drag with the mouse over values on the list to highlight the corresponding cross in the dot plot. Drag over crosses to highlight the corresponding value in the list. |
Observe that when successive marks in the list are similar, the corresponding crosses are close together. High density is therefore shown in the dot plot with closely grouped crosses.
Note that the 'gap' in the list between 54 and 69 is clearer in the dot plot.
In large data sets, jittering the crosses helps show density
A simple dot plot is often adequate for small data sets. However in larger data sets, the crosses often overlap. Indeed, if several crosses coincide, they become indistinguishable from a single cross, so high density may be obscured.
One solution is to randomly move the crosses perpendicularly to the axis in order to separate them somewhat. This is called jittering the points.
(You should rely on a computer to do the jittering for you, but it could be done by hand by rolling a 6- or 10-sided die for each cross to determine its jittering in millimetres.)
Essay marks
The classes in an intermediate school are grouped in 'teams' of three or four classes. All students in one team are asked to write an essay on the same topic. The dot plot below shows the essay marks (out of 100).
Drag the slider to jitter the points. Click the button on the right to change the jittering — i.e. to change the random vertical position of the crosses.
Only enough jittering should be used to separate the high density of crosses -- moving the slider about half way is best for the data above. Without jittering, too many crosses overlap to allow us to assess the distribution of marks.
(Note that the vertical positions of the crosses have no importance — the vertical movement of crosses is 'random' and is only intended to separate overlapping crosses.)
Stacking the crosses shows density best
Jittering large batches of values can provide an effective display of ranges of high and low densities of values. However the randomness of the jittering can be disconcerting.
A stacked dot plot uses the perpendicular axis more directly to show density. A stacked dot plot is obtained by:
- grouping values into 'bins', then
- stacking the crosses in each bin on top of each other.
(Statisticians normally use the term 'classes' rather than 'bins' but this can lead to confusion when talking about class marks.) The diagram below illustrates this stacking.
Click the button Animate Stacking. Blue vertical lines are first drawn to define the bins. The crosses are then moved horizontally to the centre of their bin, and finally the crosses are stacked.
The slider can be used to replay the animation more slowly.
The shape of the stacked dot plot depends on the width of the bins that are used to group the crosses.
Select different cross sizes from the pull-down menu and replay the animation to observe the effect of different groupings — large crosses require a coarser grouping of values, so the stacks tend to be higher.
In character-based displays, the grouping is based on the width of a character, and periods and colons are often used instead of crosses.
The stacked dot plot below shows the set of essay marks that were examined in a jittered dot plot earlier. The diagram was produced by the statistical program Minitab.
: : : : : : : : : : : . : : : : : : . : : : .: : . . :. : .: :::... :. .::.:.:::: .: .. ::.... . ----+---------+---------+---------+---------+---------+---------+-Mark 40 50 60 70 80 90 100 |
In some data sets, grouping the data into stacks results in loss of information. However this level of grouping rarely affects our interpretation of the data.
Each horizontal character position in the above dot plot contains a range of 1 mark. Since only whole numbers of marks were awarded, there has been no loss of information here. Note the peaks in the dot plot corresponding to measurements ending in '0' and '5' — there has been a tendency to 'round' marks to a multiple of 5!
Height is an effective way to display density in a stacked dot plot, despite some loss of detail in the individual values through the grouping of crosses.