Marginal and conditional probs can be found from joint probs (and vice versa)
We have used three types of probability to describe a model for two categorical variables — the joint probabilities, the marginal probabilities for the two variables and the conditional probabilities for each variable given the value of the other variable. These sets of probabilities are closely related. Indeed, the model can be equivalently described by any of the following.
The diagram below shows how to find each set of probabilities from the others, using the formulae described in the earlier pages of this section.
Bayes theorem
In particular, note that it is possible to obtain the conditional probabilities for X given Y, px | y, from the marginal probabilities of X, px, and the conditional probabilities for Y given X, py | x. This can be expressed in a single formula that is called Bayes Theorem, but it is easier in practice to do the calculations in two steps, obtaining the joint probabilities, pxy, in the first step. There are several important applications of Bayes Theorem.
Fraudulent tax claims
Tax inspectors investigate some of the tax returns that are submitted by individuals if they think that some claims for expenses are too high or are unjustified.
An investigation of the tax return does not always conclude that the claims were fraudulent — their suspicions are rarely 100% accurate. There are two types of error:
There are commonly non-zero probabilities for each of these types of error. Consider tax inspectors who have probability 0.1 of investigating a correct claim and 0.2 of not investigating a bad claim. These are conditional probabilities and can be written formally as:
pinvestigated | good claim = 0.1 pnot investigated | bad claim = 0.2
Since the probability (proportion) of investigating a bad claim is one minus the conditional probability of investigating it (and a similar result for correct claims), the remaining conditional probabilities are
pnot investigated | good claim = 0.9 pinvestigated | bad claim = 0.8
We will also assume that 10% of tax returns are bad claims. This corresponds to a marginal probability, P(bad claim) = 0.10.
The diagram below shows how these marginal probabilities for Y (claim type) and conditional probabilities for X (investigation) given Y can be used to obtain the conditional probabilities for Y (claim type) given X (investigation).
The initial information is shown in blue at the top of the diagram. The joint probabilities (green) are first found from them. Click on any value in the table of joint probabilities to see how it is related to the initial information.
Marginal probabilities for the test results are next obtained by adding the columns of joint probabilities. Click on any of the black marginal probabilities to see how they are obtained from the joint probabilities.
Finally the conditional probabilities for claim type (given whether the tax return has been investigated) are obtained from the joint probabilities and the marginal probabilities for the claim types. Click on the conditional probabilities on the bottom right of the diagram to see the formula.
Initially there might seem to be a contradiction between the two conditional probabilities,
pbad claim | investigated = 0.471
pinvestigated | bad claim = 0.8
However the two probabilities are consistent since they have very different interpretations. The proportional Venn diagrams below help to explain the difference. The diagram on the left shows the marginal and conditional probabilities given in the question. The corresponding diagram on the right shows the marginal probabilities for the whether claims are investigated and the conditional probabilities for good/bad claims.
Remember that the areas of the rectangles equal the joint probabilities and are therefore the same in both diagrams.
Drag the slider to alter the proportion of people who make bad tax claims in the population. (We assume that the conditional probabilities of investigating the claims remain the same.) Observe that: