Relationship between a numerical and a categorical variable
The previous page showed that the marginal relationship between two numerical variables, X and Y, can be very different from their conditional relationship for specific values of Z. The same can happen when X is a categorical variable.
Marginal and conditional relationships
When X is categorical (splitting the data into groups), there are again two different types relationship between Y and X, depending on whether or not we make use of a third variable, Z.
The marginal relationship can again be very different from the conditional relationships, even changing 'direction'. This will be clearer in an example.
If the existence of a lurking variable Z is not recognised, the relationship between Y and X may be misrepresented.
Weights of fur seals
In 2012, a marine reserve was created. The diagram below shows the weights (kg) of seals that were observed along a coastline bordering the reserve in March 2013 and March 2014. In the notation above, the seal weight is Y and the year is a categorical variable, X.
Since the mean weight is lower in 2014, it is tempting to conclude that seals are smaller in 2014 than in 2013. (This is the marginal relationship between seal weight and year.)
Click the checkbox Slice. The seals were observed in four different colonies and colony is a categorical lurking variable. The diagram now shows seal weights from one colony, a conditional distribution. Drag the slider and observe that the mean weight of seals actually increased within each colony.
Although seal weights increased in all colonies in 2014, more seals were sampled from colonies containing smaller seals, so the overall (marginal) average seal weight decreased.