Information from the variation in data

Variation in data is not simply an annoyance — the variation itself can hold important information. An important role of statistics is to display and describe this variation in ways that highlight the information in it.

Rain days

The table below shows the number of rainy days in an African village each year between 1994 and 2013.

Rain days from 1994 to 2013
1994
1995
1996
1997
1998
1999
2000
  101
92
119
63
74
54
93
    
2001
2002
2003
2004
2005
2006
2007
111
72
68
91
109
101
74
     
2008
2009
2010
2011
2012
2013
92
95
60
53
89
  104

Since any systematic change in climate (if it exists) is much smaller than the random year-to-year variation in the data, we will ignore the fact that the data are a time series and treat them as an unordered set of values.

Rain days in year
101 92 119 63 74
54 93 111 72 68
91 109 101 74 92
95 60 53 89 104

What can you see?

There is clearly variability between years and a quick scan shows that all values are between 53 and 119 days. But what else can be easily learned from the table?

Sorting the data can help

It is not easy to obtain further useful information from a table of raw data. Different displays of the data may however highlight meaningful patterns. Graphical displays are usually most effective, but even sorting the data into order gives some insight into the values.

The list below initially shows the rain days in time order. It is difficult to see any unusual features in the raw data.

Drag the slider to the right to sort the data into increasing order, then look for features in the sorted list of values.

Perhaps the two clusters correspond to different types of year? El Nino and La Nina? This analysis suggests further investigation by the researcher.