Categorical variables and groups
A categorical variable can be used to split the individuals in a data set into groups. For example, a ski shop might record the value of any 'extras' that are sold at the same time as new skis — a numerical variable containing the value of the extras for each sale. If the salesperson is also noted in a categorical variable, this variable could be used to split the sales data into groups — a set of values for each salesperson. The shop could compare these groups to identify its most effective staff.
Conversely, if data were separately collected from different groups of individuals, the resulting data sets could be combined with a categorical variable distinguishing between the groups. For example, the quality control manager in a factory producing light bulbs might want to compare their lifetimes with bulbs produced by two competitors, so lifetimes would be recorded from a sample of bulbs of each type. The three sets of lifetimes might be combined into a single data set with one numerical variable (the lifetimes) and a categorical variable to distinguish the three types of bulb. This type of data set often arises from experiments.
A categorical variable and groups are often two ways of representing the same data.
Data presented in a separate list for each group are often called unstacked data. Data presented as a single list alongside a categorical variable are called stacked data.
Rice survey
As part of a survey of rice producers in Sri Lanka, 36 farmers were randomly selected from 4 villages. The yield of rice (tonnes per hectare) was determined from each farmer.
These data are naturally presented as a separate list of yields for each village. Click on values on the left to see how they are represented using a categorical 'Village' variable in a data matrix.
Bonuses paid to managers
The bonuses (percentage of annual salary) paid to 60 lower-level managers were determined. The genders of the managers were also recorded.
These data are naturally recorded in a data matrix with columns for bonus and gender, but the gender of the managers can be used to split the bonuses into groups. Click on the top row of the data table and drag down to see how the numerical values are split into groups.