Probabilities for a single variable
A model for two categorical variables is characterised by the joint probabilities pxy. However we sometimes want to restrict attention to one of these variables on its own. The marginal probabilities for the variable X are defined and interpreted in a similar way to the marginal proportions that were defined earlier for bivariate categorical data.
The marginal probability, px, for the variable X is the proportion of (x, y) pairs in the population for which the value of X is x . For example, consider a situation where we are interested in hair colour and eye color of teenagers. The number of blue-eyed teenagers is the sum of those with either (blue eyes and blonde hair) or (blue eyes and brunette hair) or ... or (blue eyes and red hair),
nblue eyes = nblue, blonde + nblue, brunette + ...
The same holds for the proportion with blue eyes — its marginal probability,
pblue eyes = pblue, blonde + pblue, brunette + ...
This is generalised with the formula
where the right of the equation denotes summing the joint probabilities over all possible values of y. There is a similar formula for the marginal probabilities of the other variable,
Eye strain for office workers
It is difficult to find illustrative examples since population probabilities are unknown in most 'interesting' applications. The following example is based on a real data set which classifies 295 office workers by their type of work and whether they have symptoms of eye strain.
Type of work | No eye strain | Eye strain |
---|---|---|
Computer data entry | 42 | 11 |
General computer use | 79 | 30 |
Full-time typing | 64 | 14 |
Standard clerical work | 52 | 3 |
We do not know the underlying population joint probabilities for workers in this type of office in general. However, to provide an illustrative example, we will pretend that the population probabilities are equal to the proportions in this data set. For example, we will pretend that the joint probability for a worker doing computer data entry and not having eye strain is 42 / 295 = 0.1424.
Type of work | No eye strain | Eye strain | Total |
---|---|---|---|
Computer data entry | 0.1424 | 0.0373 | 0.1797 |
General computer use | 0.2678 | 0.1017 | 0.3695 |
Full-time typing | 0.2169 | 0.0475 | 0.2644 |
Standard clerical work | 0.1763 | 0.0102 | 0.1864 |
Total | 0.8034 | 0.1966 | 1.0000 |
The two marginal totals (red and orange) of the table give the marginal probabilities for the two variables. For example,
The diagram below illustrates the summing of joint probabilities to give marginal ones with a 3-dimensional barchart of the joint probabilities.
Click the formula for the marginal probabilities of 'X' (the type of work) on the right. The bars stack to show the marginal probabilities for type of work.
Similarly, clicking the formula for the marginal probabilities of 'Y' stacks the bars to show the overall probability that a worker has eye strain.