Patterns in the structure of data
There are many different contexts in which data are recorded and we have stressed the importance of context to a statistician. However the statistical methods that can be used on the data depend less on the context than on the internal structure of the data.
|
|
|
All three data sets above have the same basic structure — there are 12 numerical measurements that have been made from 12 different 'individuals' (bunches of grapes, birds or fields of wheat) and these individuals have been classified into one of two groups. The same statistical methods can be applied to all three data sets.
Variables and individuals
Although data can have a very rich structure, most data structures are fairly simple.
Most commonly, one or more measurements is recorded from each of a collection of 'individuals' (also called 'cases' or 'units'). These 'individuals' may be people, but could equally be plants, animals, houses, containers of milk, countries, days or any other units from which measurements can be made. The data can therefore be presented in a rectangular array called a data matrix.
The different measurements are called variables. The variables may describe closely related characteristics of the individuals (e.g. student marks in the three assignments in a statistics course) but the measurements can be more distinct and may even have different units (e.g. age in years, weight in kg and weekly milk production in litres of cows in a herd).
Body fat and body shape
Percentage body fat of individuals is an important measure of their health. In order to determine how body fat is related to other physical characteristics, scientists accurately determined body fat (using an underwater weighing technique) and several other body measurements from a group of 252 men.
The diagram below shows some of these measurements for the first 50 subjects in the study. (Note that the heights were recorded in inches, whereas the chest and other body circumference measurements were recorded in centimetres!)
Click on the small pictures of individuals on the top left and observe how each individual corresponds to a row in the data array.
Click on the red representations of the variables on the top right of the diagram, and observe how each type of measurement corresponds to a column in the data array.