Lurking variables and relationships between categorical variables

When the direction of the relationship reverses, the effect is called Simpson's paradox. As with other 'paradoxes', there is no real contradiction; it just takes a bit more thought to understand why your initial intuition is wrong.

Smoking and survival

In a health survey, 1,314 women were classified as smokers or non-smokers, and their survival after 20 years was recorded.

  Survival    
Smoker?   Dead     Alive     Total     P(Dead)  
  Smoker 139 443 582 0.239
  Non-smoker   230 502 732 0.314

A naive examination of the data suggests that smoking decreases the probability of dying, but the opposite is true if the women are split into age groups.

Age 18-44
    Survival    
  Smoker?   Dead     Alive     Total     P(Dead)  
    Smoker 19 269 288 0.066
    Non-smoker   13 327 340 0.038
Age 45-64
    Survival    
  Smoker?   Dead     Alive     Total     P(Dead)  
    Smoker 78 167 245 0.318
    Non-smoker   52 147 199 0.261
Age 65+
    Survival    
  Smoker?   Dead     Alive     Total     P(Dead)  
    Smoker 42 7 49 0.857
    Non-smoker   165 28 193 0.855

Proportional Venn diagram

Simpson's paradox is explained in the proportional Venn diagram below — in it, each rectangle is proportional to the number of women with these values for the variables.

Most of the women aged 65+ were non-smokers. This increased the overall death rate of the non-smokers.