Information from the variation in data
Variation in data is not simply an annoyance — the variation itself can hold important information. An important role of statistics is to display and describe this variation in ways that highlight the information in it.
Steel Works Slag data
In steel works, iron ore is smelted to extract as much iron as possible, but some iron remains in the waste from the process (slag) in the form of iron oxide (FeO). The table below shows the percentage of FeO in slag sampled from 20 batches of iron ore. (The data are presented by column in the order in which the data were collected, but this ordering is not thought to be important.)
6.1 | 5.2 | 7.9 | 2.3 | 3.4 |
1.4 | 5.3 | 7.1 | 3.2 | 2.8 |
5.1 | 6.9 | 6.1 | 3.4 | 5.2 |
5.5 | 2.0 | 1.3 | 4.9 | 6.4 |
What can you see?
There is clearly variability between samples and a quick scan shows that all values are between 0 and 10%. But what else can be easily learned from the table?
Sorting the data can help
It is not easy to obtain further useful information from a table of raw data. Different displays of the data may however highlight meaningful patterns. Graphical displays are usually most effective, but even sorting the data into order gives some insight into the values.
The list below shows the contamination data. Firstly, examine the unordered list of values. It is difficult to see any unusual features in the raw data.
Drag the slider to the right to sort the data into increasing order, then look for features in the sorted list of values.
Perhaps the two clusters correspond to slag produced by two different smelters in the steel works? Or correspond to iron ore from two different sources? This analysis suggests further investigation by the steel works.