Relationship between a numerical and a categorical variable

The previous page showed that the marginal relationship between two numerical variables, X and Y, can be very different from their conditional relationship for specific values of Z. The same can happen when X is a categorical variable.

Marginal and conditional relationships

When X is categorical (splitting the data into groups), there are again two different types relationship between Y and X, depending on whether or not we make use of a third variable, Z.

Marginal relationship
This is the observed relationship, ignoring any additional information that is available. The relationship can be summarised by the means of Y for the different groups (values of X).
Conditional relationships, given Z
These conditional relationships are described in the same way as the marginal relationship above, but are based on a subset of individuals with a particular value of Z.

The marginal relationship can again be very different from the conditional relationships, even changing 'direction'. This will be clearer in an example.

If the existence of a lurking variable Z is not recognised, the relationship between Y and X may be misrepresented.


House prices

The diagram below shows the sale prices of residential properties in a town in March 2013 and March 2014. In the notation above, the sale price is Y and the year is a categorical variable, X.

Since the mean sale price is lower in 2014, it is tempting to conclude that house prices have fallen during the year. (This is the marginal relationship between sale price and year.)

Click the checkbox Slice. The variable district is a lurking variable that splits the town into four districts and the diagram now shows the conditional relationship for one district. Drag the slider and observe that the mean sale price of housing actually increased within each district.

Marginal relationship between sale price and year
The overall decrease in mean sale price is useful information, but is caused by an increase in the proportion of sales in the poorer districts of the town.
Conditional relationship for each district
The increase in mean price in individual districts is a better indication that the prices of individual houses have increased — your own house is likely to be worth more!

Although house prices increased in all districts in 2014, more houses were sold in the poorer districts so the overall (marginal) average price decreased.