Relationships

The relationship between two numerical variables can be summarised by a correlation coefficient and least squares line. Two categorical variables may also be related.

We say that two categorical variables are associated if knowledge of the value of one tells you something about the likely value of the other.

If the conditional distribution of Y given X = x depends on the value of x, we say that X and Y are associated.

Example

We illustrate the idea of association with an artificial example relating absenteeism of employees in a supermarket chain to their weight. The table below shows the joint probabilities for these employees.

Joint Probabilities
Attendance record
Poor Satisfactory Above average Marginal
Underweight 0.0450 0.0900 0.0150 0.1500
Normal 0.0825 0.3025 0.1650 0.5500
Overweight 0.0500 0.1200 0.0300 0.2000
Obese 0.0300 0.0650 0.0050 0.1000
Marginal 0.1700 0.5400 0.2900 1.0000

A proportional Venn diagram displays the conditional probabilities for attendance, given weight category, graphically.

If we know that an employee has normal weight, there is a higher probability of having above average attendance than an overweight employee. Since the conditional probabilities for attendance, given weight are different for different weight categories, the two variables are associated.