Data sets with two categorical variables
Bivariate categorical data sets are usually summarised with a contingency table.
For example, a study examined 686 tourists and classified each by educational level and by whether they were 'information seekers' (who requested destination-specific literature from travel agents) or 'non-seekers':
Information seeker? | |||
---|---|---|---|
Education | Yes | No | Total |
Some high school | 013 | 027 | 40 |
High school degree | 064 | 118 | 182 |
Some college | 100 | 123 | 223 |
College degree | 059 | 069 | 128 |
Graduate degree | 067 | 046 | 113 |
Total | 303 | 383 | 686 |
Joint probabilities
Bivariate categorical data can be modelled as a random sample from an underlying population of pairs of categorical values. The population proportion for each pair (x, y) is denoted by pxy and is called the joint probability for (x, y).
In games of chance, we can often work out the joint probabilities. For example, if a gambler draws a card from a shuffled deck and also tosses a coin, there are eight possible combinations,