Signal and noise
Any graphical or tabular display of data should be designed to highlight important features of the data. This useful information in the display is called its signal. Other aspects of the display that do not contain information that can be usefully interpreted are called the noise in the display.
Edward Tufte, in an excellent book about data presentation (The Visual Display of Quantitative Information, 1983), distinguished different kinds of noise in displays.
Both kinds of noise make it harder to detect the signal in a display, so noise should be avoided.
Significant digits
One type of data noise is very common, but easily removed. Many tables contain values that are reported with more significant digits than necessary. Usually the pattern of values in a table can be understood from only their first 2 or 3 digits — the remaining digits are data noise.
(If the complete data may be needed by others for further analysis, the full data can be included in an appendix or made available on a web site, but not in the body of a report.)
Car colours in New Zealand
The table below describes the colours of all cars registered in New Zealand in 2006.
Nobody reading the table would be interested in the final few digits of the values. Use the '-' button under the frequencies to reduce the number of significant digits displayed.
Showing the frequencies to the nearest thousand removes data noise from the table but retains all useful information.
In a similar way, round the proportions to 3 decimals — further digits do not help you to understand the data.
Finally click the Percentage checkbox to display percentages instead of proportions. This simply multiplies the proportions by 100, but it removes some of the leading zeros and therefore makes the values stand out better
Licensed vehicles in New Zealand
The next table was also published on the Land Transport New Zealand web site. It describes the types of vehicles licensed in June 2006 and the changes during the previous two years.
June 2006 | June 2005 | June 2004 | |||
---|---|---|---|---|---|
Total | % variation from prev year | Total | % variation from prev year | Total | |
Cars | 2,232,915 | 2.00 | 2,189,187 | 3.35 | 2,118,240 |
Rental cars | 21,754 | -3.76 | 22,604 | 2.15 | 22,128 |
Taxis | 8,011 | -1.97 | 8,172 | 1.03 | 8,089 |
Trucks | 408,757 | 2.23 | 399,843 | 3.51 | 386,295 |
Buses/coaches | 16,486 | 5.20 | 15,671 | 4.95 | 14,932 |
Trailers/caravans | 420,289 | 2.76 | 408,982 | 2.99 | 397,113 |
Motorcycles | 43,513 | 15.37 | 37,717 | 8.16 | 34,873 |
Mopeds | 14,171 | 37.82 | 10,282 | 19.32 | 8,617 |
Tractors | 27,124 | 2.27 | 26,521 | 4.91 | 25,279 |
Exempt vehicles | 11,130 | 7.77 | 10,328 | 6.39 | 9,708 |
Miscellaneous | 22,464 | 7.25 | 20,946 | 9.06 | 19,206 |
Total | 3,226,614 | 2.42 | 3,150,253 | 3.47 | 3,044,480 |
The last 2 or 3 digits of the counts are of little relevence to most policy makers or other readers of the table. These values could be made available in a separate appendix (or as a linked file in spreadsheet format), but most users would get the same information more clearly if the vehicle counts were given to the nearest thousand and the percentage changes were shown with a single decimal digit.
The table below also rearranges the columns to separate the columns of vehicle counts from the columns of percentage change. This makes it easier to compare related values.
Number in June (thousand) | Percentage change | |||||
---|---|---|---|---|---|---|
2006 | 2005 | 2004 | 2005-6 | 2004-5 | ||
Cars | 2,233 | 2,189 | 2,118 | 2.0 | 3.4 | |
Rental cars | 22 | 23 | 22 | -3.8 | 2.2 | |
Taxis | 8 | 8 | 8 | -2.0 | 1.0 | |
Trucks | 409 | 400 | 386 | 2.2 | 3.5 | |
Buses/coaches | 17 | 16 | 15 | 5.2 | 5.0 | |
Trailers/caravans | 420 | 409 | 397 | 2.8 | 3.0 | |
Motorcycles | 44 | 38 | 35 | 15.4 | 8.2 | |
Mopeds | 14 | 10 | 9 | 37.8 | 19.3 | |
Tractors | 27 | 27 | 25 | 2.3 | 4.9 | |
Exempt vehicles | 11 | 10 | 10 | 7.8 | 6.4 | |
Miscellaneous | 22 | 21 | 19 | 7.3 | 9.1 | |
All licensed vehicles | 3,227 | 3,150 | 3,044 | 2.4 | 3.5 |
It could be argued that one decimal digit should be shown for the category Taxis since the numbers are so small that they do not change when rounded to thousands. However the columns of percentage change adequately describe the differences between the years for these categories.