Probabilities for a single variable
A model for two categorical variables is characterised by the joint probabilities pxy. However we sometimes want to restrict attention to one of these variables on its own. The marginal probabilities for the variable X are defined and interpreted in a similar way to the marginal proportions that were defined earlier for bivariate categorical data.
The marginal probability, px, for the variable X is the proportion of (x, y) pairs in the population for which the value of X is x . For example, consider a situation where we are interested in hair colour and eye color of teenagers. The number of blue-eyed teenagers is the sum of those with either (blue eyes and blonde hair) or (blue eyes and brunette hair) or ... or (blue eyes and red hair),
nblue eyes = nblue, blonde + nblue, brunette + ...
The same holds for the proportion with blue eyes — its marginal probability,
pblue eyes = pblue, blonde + pblue, brunette + ...
This is generalised with the formula
where the right of the equation denotes summing the joint probabilities over all possible values of y. There is a similar formula for the marginal probabilities of the other variable,
Type and site of melanoma
It is difficult to find illustrative examples since population probabilities are unknown in most 'interesting' applications. The following example is based on a study of 400 patients who had been diagnosed with malignant melanoma. For each patient, the site of the tumour and its histological type were recorded.
Site | |||
---|---|---|---|
Type of tumour | Head & neck | Trunk | Extremities |
Hutchinson's | 22 | 2 | 10 |
Superficial spreading | 16 | 54 | 115 |
Nodular | 19 | 33 | 73 |
Indeterminate | 11 | 17 | 28 |
We do not know the underlying population joint probabilities for patients with malignant melanoma in general. However, to provide an illustrative example, we will pretend that the population probabilities are equal to the proportions in this data set. For example, we will pretend that the joint probability for a patient with malignant melanoma having a Nodular melanoma in the Trunk is 33 / 400 = 0.0825.
Site | ||||
---|---|---|---|---|
Type of tumour | Head & neck | Trunk | Extremities | Total |
Hutchinson's | 0.0550 | 0.0050 | 0.0250 | 0.0850 |
Superficial spreading | 0.0400 | 0.1350 | 0.2875 | 0.4625 |
Nodular | 0.0475 | 0.0825 | 0.1825 | 0.3125 |
Indeterminate | 0.0275 | 0.0425 | 0.0700 | 0.1400 |
Total | 0.1700 | 0.2650 | 0.5650 | 1.0000 |
The two marginal totals (red and orange) of the table give the marginal probabilities for the two variables. For example,
The diagram below illustrates the summing of joint probabilities to give marginal ones with a 3-dimensional barchart of the joint probabilities.
Click the formula for the marginal probabilities of 'X' (the site) on the right. The bars stack to show the marginal probabilities for the sites of the tumours.
Similarly, clicking the formula for the marginal probabilities of 'Y' stacks the bars to show the overall probabilities that someone with a malignant tumour has the four different types.