Categorical variables and groups
A categorical variable can be used to split the individuals in a data set into groups. For example, we might record the weight and shell colour (white or brown) of 50 hens' eggs. We could use the 'shell colour' variable to split the eggs into white eggs and brown eggs and separately list the weights of the two groups.
Conversely, if data were separately collected from different groups of individuals, the resulting data sets could be combined with a categorical variable distinguishing between the groups. For example, an experiment might be conducted to compare the lifetimes of three brands of electric light bulb. The three sets of lifetimes might be combined into a single data set with one numerical variable (the lifetimes) and a categorical variable to distinguish the three brands. This type of data set often arises from experiments.
A categorical variable and groups are often two ways of representing the same data.
Data presented in a separate list for each group are often called unstacked data. Data presented as a single list alongside a categorical variable are called stacked data.
Rice survey
As part of a survey of rice producers in Sri Lanka, 36 farmers were randomly selected from 4 villages. The yield of rice (tonnes per hectare) was determined from each farmer.
These data are naturally presented as a separate list of yields for each village. Click on values on the left to see how they are represented using a categorical 'Village' variable in a data matrix.
Days off work from illness
The manager of a factory was concerned by the number of days that workers took off work each year due to illness. Personnel records for 60 workers were examined, and their sick days were recorded along with their age group, old (defined as 40 or over) or young (under 40).
These data are naturally recorded in a data matrix with columns for days off work and for age group, but the ages of the workers can be used to split the other variable into groups. Click on the top row of the data table and drag down to see how the numerical values are split into groups.