Patterns in the structure of data

There are many different contexts in which data are recorded and we have stressed the importance of context to a statistician. However the statistical methods that can be used on the data depend less on the context than on the internal structure of the data.

Testing of grapes
 Variety  Sugar content
A
A
A
A
A
A
B
B
B
B
B
B
62
46
51
79
44
60
72
55
78
63
81
53
Bird weight
Sex Weight
Male
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female
Female
62
46
51
79
44
60
72
55
78
63
81
53
Yield of wheat
 Fertiliser?   Yield 
No fert
No fert
No fert
No fert
No fert
No fert
Fert
Fert
Fert
Fert
Fert
Fert
62
46
51
79
44
60
72
55
78
63
81
53

All three data sets above have the same basic structure — there are 12 numerical measurements that have been made from 12 different 'individuals' (bunches of grapes, birds or fields of wheat) and these individuals have been classified into one of two groups. The same statistical methods can be applied to all three data sets.

Variables and individuals

Although data can have a very rich structure, most data structures are fairly simple.

Most commonly, one or more measurements is recorded from each of a collection of 'individuals' (also called 'cases' or 'units'). These 'individuals' may be people, but could equally be plants, animals, houses, containers of milk, countries, days or any other units from which measurements can be made. The data can therefore be presented in a rectangular array called a data matrix.

The different measurements are called variables. The variables may describe closely related characteristics of the individuals (e.g. student marks in the three assignments in a statistics course) but the measurements can be more distinct and may even have different units (e.g. age in years, weight in kg and weekly milk production in litres of cows in a herd).

Body fat and body shape

Percentage body fat of individuals is an important measure of their health. In order to determine how body fat is related to other physical characteristics, scientists accurately determined body fat (using an underwater weighing technique) and several other body measurements from a group of 252 men.

The diagram below shows some of these measurements for the first 50 subjects in the study. (Note that the heights were recorded in inches, whereas the chest and other body circumference measurements were recorded in centimetres!)

Click on the small pictures of individuals on the top left and observe how each individual corresponds to a row in the data array.

Click on the red representations of the variables on the top right of the diagram, and observe how each type of measurement corresponds to a column in the data array.