Marginal and conditional probs can be found from joint probs (and vice versa)
Consider two partitions of the sample space, \(\{A_i, i=1,\dots, n_A\}\) and \(\{B, B^c\}\). The first partition might be simply \(A\) and its inverse but there could be more possibilities.
The probabilities for all possible events can be described in three different ways:
We have already shown how to find marginal and conditional probabilities from the joint probabilities, and how to find joint probabilities from marginal and conditional ones. The following theorem gives a formula that helps find one set of conditional probabilities from the other.
Bayes Theorem
If \(\{A_1, ..., A_k\} \) is a partition of the sample space,
\[ P(A_j \mid B) = \frac {P(A_j) \times P(B \mid A_j) } {\sum_{i=1}^{k} {P(A_i) \times P(B \mid A_i) } } \]From the definition of conditional probability,
\[ P(A_j \mid B) = \frac {P(B \textbf{ and } A_j) } {P(B) } \]Since the definition of conditional probability also implies that
\[ P(B \textbf{ and } A_j) = P(A_j) \times P(B \mid A_j) \]we can rewrite the equation as
\[ \begin{align} P(A_j \mid B) &= \frac {P(A_j) \times P(B \mid A_j) } {P(B) } \\ &= \frac {P(A_j) \times P(B \mid A_j) } {\sum_{i=1}^{k} {P(A_i) \times P(B \mid A_i) } } \end{align} \]using the law of total probability.
The formula looks complicated, but it is usually easiest to apply the result from first principles by first calculating the joint probabilities, as illustrated by the example below.
Example
Medical diagnostic tests for a disease are rarely 100% accurate. There are two types of error:
Consider a diagnostic test with probability 0.05 of a negative test result for someone who has the target disease, and probability 0.10 of a positive test result for someone who does not have the disease. These are conditional probabilities and can be written formally as:
\[ P(negative \mid disease) = 0.05 \quad\quad\quad P(positive \mid no \text{ } disease) = 0.10 \]Since the probability of someone with the disease having a positive test result is one minus the conditional probability of a negative test result (and a similar result for those who do not have the disease), the remaining conditional probabilities are
\[ P(positive \mid disease) = 0.95 \quad\quad\quad P(negative \mid no \text{ } disease) = 0.90 \]We will also assume that 10% of people who are given the test have the disease. This corresponds to a marginal probability,
\[ P(disease) = 0.10 \]What is the probability that someone with a positive test result actually has the disease?
We are asked for \(P(disease \mid positive)\). From the definition of conditional probability, this is
\[ P(disease \mid positive) = \frac {P(disease \textbf{ and } positive)} {P(positive)} \]The numerator is
\[ P(disease \textbf{ and } positive) = P(disease) \times P(positive \mid disease) = 0.1 \times 0.95 = 0.095 \]The denominator is found from the law of total probability,
\[ \begin{align} P(positive) &= P(disease \textbf{ and } positive) + P(no \text{ } disease \textbf{ and } positive) \\ &= 0.1 \times 0.95 + 0.9 \times 0.1 = 0.185 \end{align} \]The probability of having the disease, conditional on testing positive, is therefore
\[ P(disease \mid positive) = \frac {0.095} {0.185} = 0.514 \]The diagram below illustrates the calculations.
Initially there might seem to be a contradiction between the two conditional probabilities in the above example,
\[P(positive \mid disease) = 0.95 \quad\quad\quad P(disease \mid positive) = 0.514 \]However the two probabilities are consistent since they have very different interpretations. The diagram below helps to explain.
Diagnostic test results and disease status
The proportional Venn diagrams on the left shows the marginal and conditional probabilities given in the original question. The proportional Venn diagram on the right shows the marginal probabilities for the test results and the conditional probabilities for disease status.
Remember that the areas of the rectangles equal the joint probabilities and are therefore the same in both diagrams.
Drag the slider to alter the proportion of people who have the disease in the population. (We assume that the probabilities of false negatives and false positives from the test remain the same.) Observe that: