Frequency and relative frequency

The number of values in any range is called the frequency of values in the range. In a similar way, the proportion of values is called the relative frequency.

The key to understanding histograms is the relationship between the area of the rectangles and the relative frequency of the corresponding bins.

Area equals relative frequency

A stem and leaf plot can be changed into a histogram by replacing each leaf digit by a rectangle of the same size. In a histogram, each value therefore corresponds to a rectangle of the same area.

As a consequence, the area of a histogram contributed by each value is the same

where n is the number of values in the data set. Therefore,

The histogram area above any bins equals the proportion of values in these bins.
 

In a high school, all 120 year 9 students take an English grammar test. The histogram below summarises the marks of the students (out of 50).

Each of the 120 values in the data set is represented by a rectangle.

Click on the histogram at the value 12 on the axis and drag to the right, highlighting marks from 10 to 19. Twelve students out of 120 got marks in this range, so a proportion 12/120 = 0.10 of the values are in these two histogram bins. This is also the proportion of the histogram area that is highlighted.

In the same way, drag over the histogram bins with marks from 25 to 34. Half of the students got marks in this range so this is half of the total histogram area.

Two aspects of the above histogram are worth stressing.

The histogram bins are offset by 0.5

Marks are usually whole numbers. If the histogram bins are 0 to 10, 10 to 20, etc, then there is ambiguity about whether a mark of 10 will belong to the first of second of these bins. (Even if you follow a strict rule when drawing the histogram, there will still be a visual uncertainty for the reader.)

It is best to offset the histogram bins by 0.5 to remove this ambiguity. In the above histogram, the bins are -0.5 to 4.5, 4.5 to 9.5, etc.

All bin widths are the same

All bins in the histogram are the same width, 5. If any students had a mark of 50, we would therefore have needed to add an extra bin 49.5 to 54.5 at the end of the histogram.

It is possible to draw histograms with unequal bin width, but the corresponding rectangle heights must no longer be the frequency of bins — this is explained further in the next page. Note however that it would be incorrect to extend the final bin to 44.5 to 50.5 to include the mark 50 without the modification described in the next page, More about histogram bin width.