Data with one categorical and one numerical variable
We have previously examined bivariate data sets with...
This section briefly examines the remaining combination...
Numerical response and categorical explanatory variable
An ecologist traps 50 rats in a nature reserve and records the weight and sex of each. Weight should be treated as the response variable since gender could affect weight, but the weight could not affect the rat's gender.
When the explanatory variable is categorical, it should be used to split the individuals into groups. The methods that were described earlier for comparison of numerical distributions can be used. For example, the distributions might be compared with box plots.
This diagram helps us to understand how weight depends on sex.
Categorical response and numerical explanatory variable
When the categorical variable is the response, a different analysis is required. If we were analysing the relationship between scarring and weight of male rats in the above survey, presence of scarring should be treated as the response variable.
Analysis is harder, but we might split weights into categories (e.g. under 200g, 200g to 300g, ...) and use this to split the individuals into groups. Stacked bar charts might then be used to display the relationship.
This diagram helps us to understand how the proportion with scars depends on weight.
When there is no unique response...
In other situations, the classification of variables into a response and explanatory variable is less clear. If rats were classified by weight and their willingness to take a poisoned bait, it cannot be argued that one variable cannot affect the other. (More 'inquisitive' rats may find more food, or larger rats may be 'bolder'.)
To examine the association between the variables, there are therefore two complementary ways to examine the data.
The remainder of this section expands on how we might explain a categorical response in terms of a numerical explanatory variable.
Menstruation and age
A study was conducted in Warsaw to determine the proportions of girls who had started menstruating at different ages. A total of 3,898 girls of various ages between 8 and 19 were asked whether they had started menstruating.
Age class (to nearest month) | Menstruating | Total girls | |||
---|---|---|---|---|---|
|
|
|
The response is a categorical variable with two possible values (menstruating or not menstruating). How does the proportion menstruating depends on the explanatory variable age?
The bar charts below help to explain the relationship. The bar chart for each age group is centred on the middle age in the class.
Click the checkbox Stacked. Both the stacked and unstacked displays show clearly the increase in the proportion menstruating with age.
Bad displays of the data
Choose the option Frequency from the pop-up menu. There are two problems with the stacked and unstacked bar charts of the counts.