Probabilities for a single variable

A model for two categorical variables is characterised by the joint probabilities pxy. However we sometimes want to restrict attention to one of these variables on its own. The marginal probabilities for the variable X are defined and interpreted in a similar way to the marginal proportions that were defined earlier for bivariate categorical data.

The marginal probability, px, for the variable X is the proportion of (xy) pairs in the population for which the value of X is x . For example, consider a situation where we are interested in hair colour and eye color of teenagers. The number of blue-eyed teenagers is the sum of those with either (blue eyes and blonde hair) or (blue eyes and brunette hair) or ... or (blue eyes and red hair),

nblue eyes  =  nblue, blonde  +  nblue, brunette  +  ...

The same holds for the proportion with blue eyes — its marginal probability,

pblue eyes  =  pblue, blonde  +  pblue, brunette  +  ...

This is generalised with the formula

where the right of the equation denotes summing the joint probabilities over all possible values of y. There is a similar formula for the marginal probabilities of the other variable,

Eye strain for office workers

It is difficult to find illustrative examples since population probabilities are unknown in most 'interesting' applications. The following example is based on a real data set which classifies 295 office workers by their type of work and whether they have symptoms of eye strain.

Data from 295 office workers
Type of work No eye strain Eye strain
Computer data entry 42 11
General computer use 79 30
Full-time typing 64 14
Standard clerical work 52 3

We do not know the underlying population joint probabilities for workers in this type of office in general. However, to provide an illustrative example, we will pretend that the population probabilities are equal to the proportions in this data set. For example, we will pretend that the joint probability for a worker doing computer data entry and not having eye strain is 42 / 295 = 0.1424.

Probabilities for office workers in general
Type of work No eye strain Eye strain Total
Computer data entry 0.1424 0.0373 0.1797
General computer use 0.2678 0.1017 0.3695
Full-time typing 0.2169 0.0475 0.2644
Standard clerical work 0.1763 0.0102 0.1864
Total 0.8034 0.1966 1.0000

The two marginal totals (red and orange) of the table give the marginal probabilities for the two variables. For example,

The diagram below illustrates the summing of joint probabilities to give marginal ones with a 3-dimensional barchart of the joint probabilities.

Click the formula for the marginal probabilities of 'X' (the type of work) on the right. The bars stack to show the marginal probabilities for type of work.

Similarly, clicking the formula for the marginal probabilities of 'Y' stacks the bars to show the overall probability that a worker has eye strain.