Patterns in the structure of data

There are many different contexts in which data are recorded and we have stressed the importance of context to a statistician. However the statistical methods that can be used on the data depend less on the context than on the internal structure of the data.

Schools in districts
 District type  Schools
Urban
Urban
Urban
Urban
Urban
Urban
Rural
Rural
Rural
Rural
Rural
Rural
62
46
51
79
44
60
72
55
78
63
81
53
Climate in a year
Region Rain days
North
North
North
North
North
North
South
South
South
South
South
South
62
46
51
79
44
60
72
55
78
63
81
53
Testing of maize
 Variety   Water content 
A
A
A
A
A
A
B
B
B
B
B
B
62
46
51
79
44
60
72
55
78
63
81
53

All three data sets above have the same basic structure — there are 12 numerical measurements that have been made from 12 different 'individuals' (districts, villages or heads of maize) and these individuals have been classified into one of two groups. The same statistical methods can be applied to all three data sets.

Variables and individuals

Although data can have a very rich structure, most data structures are fairly simple.

Most commonly, one or more measurements is recorded from each of a collection of 'individuals' (also called 'cases' or 'units'). These 'individuals' may be people, but could equally be plants, animals, houses, containers of milk, countries, days or any other units from which measurements can be made. The data can therefore be presented in a rectangular array called a data matrix.

The different measurements are called variables. The variables may describe closely related characteristics of the individuals (e.g. student marks in the three assignments in a statistics course) but the measurements can be more distinct and may even have different units (e.g. age in years, weight in kg and weekly milk production in litres of cows in a herd).

Body fat and body shape

Percentage body fat of individuals is an important measure of their health. In order to determine how body fat is related to other physical characteristics, scientists accurately determined body fat (using an underwater weighing technique) and several other body measurements from a group of 252 men.

The diagram below shows some of these measurements for the first 50 subjects in the study. (Note that the heights were recorded in inches, whereas the chest and other body circumference measurements were recorded in centimeters!)

Click on the small pictures of individuals on the top left and observe how each individual corresponds to a row in the data array.

Click on the red representations of the variables on the top right of the diagram, and observe how each type of measurement corresponds to a column in the data array.