Proportional Venn diagrams
Marginal and conditional probabilities are meaningful and useful summaries of the relationship between two categorical variables. Proportional Venn diagrams were used earlier to graphically display marginal and conditional proportions for bivariate categorical data sets. They can also be used in the same way to display marginal and conditional probabilities for a bivariate categorical model.
The proportional Venn diagram is drawn in a unit square (with both sides of length 1.0).
Area = joint probability
The definition of the conditional probability py | x is
and the relationship can be rewritten in the form
Since this is the product of the height and width of the rectangle representing categories x and y ,
The area of any rectangle in the diagram equals the joint probability of the categories it represents.
A similar diagram can be based on the marginal probabilities of Y and the conditional probabilites of X given Y, splitting the unit square first horizontally and then vertically. The areas of the resulting rectangles are again equal to the joint probabilities, so the two diagrams are just rearrangements of the same areas (the joint probabilities, pxy).
The use of the diagrams is best explained in an example.
Apple bruising
Before showing the relationship between joint, conditional and marginal probabilities, we illustrate the formulae for joint, conditional and marginal proportions.
The contingency table below describes bruising of 96 apples in a packing plant. The apples were classified by the variety of apple (Granny Smith or Fuji) and whether or not they were bruised. (The data are not real.)
Not bruised | Bruised | |
---|---|---|
Granny Smith | 40 | 8 |
Fuji | 24 | 24 |
The diagram below shows a Proportional Venn diagram for the data. Note that the four areas are proportional to the numbers of apples for each combination of apple type and bruising.
World population by age and region
The table below shows the world population in 2013, categorised by region and by age group.
Age | |||
---|---|---|---|
0-14 | 15-64 | 65+ | |
Africa | 0,435.6 | 0,619.5 | 039.6 |
Asia | 1,071.3 | 2,917.7 | 303.8 |
America, Europe and Oceania | 0,357.2 | 0,1162.3 | 223.0 |
Consider randomly selecting one person in the world. The joint probabilities for this person being in each age/region are obtained by dividing the above values by the total world population.
Age | |||
---|---|---|---|
0-14 | 15-64 | 65+ | |
Africa | 0.061 | 0.087 | 0.006 |
Asia | 0.150 | 0.409 | 0.043 |
America, Europe and Oceania | 0.050 | 0.163 | 0.031 |
Marginal and conditional probabilities can be obtained using formulae from the previous pages. The proportional Venn diagram below displays them graphically.
The diagram initially splits the unit square horizontally using the marginal probabilities of Y — the probabilities of a random person being from each of the three regions. Each row is split according to the conditional probabilites for age group within that region. From the diagram, we can easily see that:
Click on any rectangle in the diagram to observe how its area equals the product of a marginal and conditional probability and therefore is the joint probability for the corresponding categories.
Click the rightmost formula under the diagram. The rectangles change in shape but retain the same areas to rearrange into vertical columns corresponding to the marginal probabilities for age group. Each column is split in proportion to the conditional probabilities of region given age group. From this version of the diagram, observe that