Data sets with two categorical variables

Bivariate categorical data sets are usually summarised with a contingency table.

For example, a study examined 62 patients who had been given a prescription medicine for some condition. Each patient was classified by whether they had complied with the treatment prescribed and by racial group:

Race     Compliers     Non-compliers    Total   
  White 13 10 23
  Non-white    13 26 39
Total 26 36 62

Joint probabilities

Bivariate categorical data can be modelled as a random sample from an underlying population of pairs of categorical values. The population proportion for each pair (xy) is denoted by pxy and is called the joint probability for (xy).

In games of chance, we can often work out the joint probabilities. For example, if a gambler draws a card from a shuffled deck and also tosses a coin, there are eight possible combinations,