Small groups or clusters of 'individuals'

In some data sets, the basic 'individuals' are arranged in fairly small groups or clusters.

Measurements of plants
In a study investigating plant growth, the researcher might record the nutrient content of a few leaves from each of a sample of plants. The leaves are the natural 'individuals' in the data matrix since measurements are made from each individual leaf, but they are grouped by plant.
Household income
Surveys often select households in an area then obtain information about every member of the household. (Sampling clusters of people is cheaper than sampling individuals.) The people are the natural 'individuals' in the data matrix, but they are grouped by household.

In data sets of these kinds, a categorical variable could distinguish between the groups, but there would be a large number of possible values and these values (the plant or houshold names) are of little direct interest.

Data at different levels

Furthermore, some measurements are usually recorded at group level rather than individual level. For example, the age and height of the plant, or the dwelling type and distance from shops of the household are all recorded at group level. These values could be stored in a separate group-level data matrix.

 

Each data matrix can be separately analysed.


Household survey

The data below are of a type that are commonly collected in surveys of consumer purchases and attitudes. A sample of houses is selected and information is collected from all individuals in these houses. For illustrative purposes, we have only shown data from 22 households and have only shown one variable at household level and three at person level.

The data matrix on the left shows household-level information — the house address and its distance from the nearest supermarket (km). The data matrix on the right shows data that were collected from the individuals in the households — age, gender and income.

Note that the households have also been given a unique number and each individual has a 'household ID' variable that links it to its household. Click on any household or individual to see the corresponding entries in the two tables.

Moving information between the data matrices

Information can be exchanged between the two data matrices in order to analyse both sets of data together.

Group level —> individual level
Each row of the individual-level data matrix could be augmented with values copied from the corresponding row (group) of the group-level data matrix
Individual level —> group level
It is also possible to summarise individual-level data and add it to the group-level data matrix. For example, we could add a 'maximum age' or 'household size' variable to the household-level data matrix. In the plant-and-leaf scenario described at the top of this page, an 'average leaf length' variable could be added to the plant-level data matrix.

Information can be obtained from multi-level data by examining both the group-level and individual-level data matrices.


Household survey

In the household survey above, the 'Distance to shops' data from household level can be copied to the individual-level data matrix to be analysed at individual level.

In a similar way, information from the individuals in a household can be used to create new variables at household level. Select Hh size (household size), Min age (minimum age) and Total income from the pop-up menu. Their values for any household are determined from variables in the individual-level data matrix for the individuals in the household. (Again, click any household's row to highlight the individuals in it.)

Analysis of the two data matrices

The data-analysis methods that will be described in CAST can be used with both the group-level and individual-level data matrices. It is however important to recognise the difference between analysing the data at these two levels.

Consider a household with 1 individual that is 2 km from shops and another household with 9 individuals that is 1 km from shops.

Household level
The average household distance from shops is 1.5 km — the average of the two household distances.
Individual level
The average individual distance from shops is 1.1 km — the average of the ten individual distances 1, 1, 1, 1, 1, 1, 1, 1, 1, 2.

Properly analysing multi-level data and interpreting the results of the analysis requires a lot of careful thought!

Although you can analyse both the group-level and individual-level data matrices using the methods that will be described in CAST, advanced statistical methods are needed to fully analyse multi-level data.