Categorical variables and groups
As explained earlier, the ideas of a categorical variable and groups are equivalent and we will use the two terminologies interchangeably.
'A categorical variable' and 'groups' are two equivalent ways to think about the same data.
For example, consider a set of hospital records containing Patient age, Length of stay and Type of illness. The data could be presented as either
These data should be analysed in exactly the same way, irrespective of the format.
Distinguishing groups in a scatterplot
This section examines how a categorical variable, Z, can help explain the relationship between two numerical variables, X and Y. It equivalently examines whether the relationship between X and Y is the same in each of several groups.
Data analysis should usually start by examining a graphical display of the data.
In a scatterplot of Y against X, the crosses can be drawn with different symbols and/or colours to represent the different groups.
Body fat of AIS athletes
Two categorical variables were recorded from each athlete — Sex and Sport. The scatterplot below initially uses different colours and symbols to distinguish the male and female athletes.
For any value of Body mass index (BMI), the average Body fat of male athletes is considerably lower than that of female athletes with the same BMI.
The different relationships between BMI and Body fat for males and females should make it easier to predict Body fat from BMI.
Select Sport from the pop-up menu to display the athletes' sports with different colours and symbols on the scatterplot. (You can click on sports in the key to highlight these crosses on the scatterplot to help distinguish the sports.)
Does the relationship between BMI and Body fat seem different for different sports?
There are too many different sports and their distributions overlap too much to easily get much information about differences between the sports from this diagram.