Ordering of the 'individuals'

Many basic statistical methods assume that the 'individuals' in a data matrix are unordered — any rearrangement of the rows would give the same information. For example, the weights of 20 loaves of bread sampled from a supermarket would form an unordered data set.

However sometimes the rows of the data matrix are ordered, usually by time. For example, temperature measurements may be recorded at 10-minute intervals between 9am and 9pm. The resulting temperatures are a continuous numerical variable whose values are time-ordered — the ordering of the values holds useful information that will help us understand the data. These kinds of data are called time series.

For a preliminary exploration of ordered data, it is often useful to examine them as though they were unordered, but a full analysis should take account of the ordering.

Examples

In both of the data sets below, the data were time-ordered. In the chain testing example, the chains were tested in the order shown, whereas in the weather example, the annual rainfalls were recorded annually from July 2001 to June 2013.

Testing of chains
  Alloy   Breaking strain
A
A
C
C
A
B
C
B
C
A
C
A
62
46
51
79
44
60
72
55
78
63
81
53
Weather
  Year type   Rainfall (July to June)
Ordinary
El Nino
Ordinary
El Nino
La Nina
El Nino
La Nina
Ordinary
El Nino
La Nina
La Nina
Ordinary
62
46
51
79
44
60
72
55
78
63
81
53

If the chain-testing experiment was conducted in an identical way for each chain, it would be possible to ignore the ordering of the data and analyse the data as though they were unordered.

This would also be a reasonable initial analysis of these weather data, but if the data had been collected over a longer period when there may have been a trend in rainfall (e.g. from global warming), it would be more important to take account of the ordering of the values.

To include the time-ordering of the data in the data matrix, a new variable can be added, as shown below.

Testing of chains
Index Alloy Breaking strain
1
2
3
4
5
6
7
8
9
10
11
12
A
A
C
C
A
B
C
B
C
A
C
A
62
46
51
79
44
60
72
55
78
63
81
53
Weather
Year starting July Year type Rainfall (July to June)
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Ordinary
El Nino
Ordinary
El Nino
La Nina
El Nino
La Nina
Ordinary
El Nino
La Nina
La Nina
Ordinary
62
46
51
79
44
60
72
55
78
63
81
53

The 'time' variable contains all information about the ordering of the data so the data matrix can otherwise be treated as 'unordered'.