Relationship between a numerical and a categorical variable

The previous page showed that the marginal relationship between two numerical variables, X and Y, can be very different from their conditional relationship for specific values of Z. The same can happen when X is a categorical variable.

Marginal and conditional relationships

When X is categorical (splitting the data into groups), there are again two different types relationship between Y and X, depending on whether or not we make use of a third variable, Z.

Marginal relationship
This is the observed relationship, ignoring any additional information that is available. The relationship can be summarised by the means of Y for the different groups (values of X).
Conditional relationships, given Z
These conditional relationships are described in the same way as the marginal relationship above, but are based on a subset of individuals with a particular value of Z.

The marginal relationship can again be very different from the conditional relationships, even changing 'direction'. This will be clearer in an example.

If the existence of a lurking variable Z is not recognised, the relationship between Y and X may be misrepresented.


Weights of fur seals

In 2012, a marine reserve was created. The diagram below shows the weights (kg) of seals that were observed along a coastline bordering the reserve in March 2013 and March 2014. In the notation above, the seal weight is Y and the year is a categorical variable, X.

Since the mean weight is lower in 2014, it is tempting to conclude that seals are smaller in 2014 than in 2013. (This is the marginal relationship between seal weight and year.)

Click the checkbox Slice. The seals were observed in four different colonies and colony is a categorical lurking variable. The diagram now shows seal weights from one colony, a conditional distribution. Drag the slider and observe that the mean weight of seals actually increased within each colony.

Marginal relationship between weight and time
In 2014, the ecologists had decided to collect more information about colonies containing immature seals. Since more seals were sampled from the colonies containing smaller seals, the overall mean was lower in 2004.
Conditional relationship for each colony
The increase in mean size in individual colonies is a better indication that the seal sizes have increased — in each colony, the seals appear to be bigger.

Although seal weights increased in all colonies in 2014, more seals were sampled from colonies containing smaller seals, so the overall (marginal) average seal weight decreased.