Relationship between a numerical and a categorical variable
The previous page showed that the marginal relationship between two numerical variables, X and Y, can be very different from their conditional relationship for specific values of Z. The same can happen when X is a categorical variable.
Marginal and conditional relationships
When X is categorical (splitting the data into groups), there are again two different types relationship between Y and X, depending on whether or not we make use of a third variable, Z.
The marginal relationship can again be very different from the conditional relationships, even changing 'direction'. This will be clearer in an example.
If the existence of a lurking variable Z is not recognised, the relationship between Y and X may be misrepresented.
House prices
The diagram below shows the sale prices of residential properties in a town in March 2013 and March 2014. In the notation above, the sale price is Y and the year is a categorical variable, X.
Since the mean sale price is lower in 2014, it is tempting to conclude that house prices have fallen during the year. (This is the marginal relationship between sale price and year.)
Click the checkbox Slice. The variable district is a lurking variable that splits the town into four districts and the diagram now shows the conditional relationship for one district. Drag the slider and observe that the mean sale price of housing actually increased within each district.
Although house prices increased in all districts in 2014, more houses were sold in the poorer districts so the overall (marginal) average price decreased.