Relationships

The relationship between two numerical variables can be summarised by a correlation coefficient and least squares line. Two categorical variables may also be related.

We say that two categorical variables are associated if knowledge of the value of one tells you something about the likely value of the other.

If the conditional distribution of Y given X = x depends on the value of x, we say that X and Y are associated.

Example

We illustrate the idea of association with an artificial example relating athletic performance of high school children to their weight. The table below shows the joint probabilities for these children.

Joint Probabilities
Athletic performance
Poor Satisfactory Above average Marginal
Underweight 0.0450 0.0900 0.0150 0.1500
Normal 0.0825 0.3025 0.1650 0.5500
Overweight 0.0500 0.1200 0.0300 0.2000
Obese 0.0300 0.0650 0.0050 0.1000
Marginal 0.1700 0.5400 0.2900 1.0000

A proportional Venn diagram displays the conditional probabilities for performance, given weight category, graphically.

If we know that a child has normal weight, there is a higher probability of having above average athletic performance than an overweight child. Since the conditional probabilities for performance, given weight are different for different weight categories, the two variables are associated.