Binomial distribution
The binomial distribution also requires certain assumptions about a sequence of success/failure trials; these do not always hold. In particular, we assume that
If the probability of success varies or the results of successive trials are positively related, the probability of a very low or very high count will be greater than a binomial distribution would give — overdispersion.
Sex of babies
The following table gives the number of male children among the first 12 children in 6,115 families of size 13, taken from hospital records in 19th century Saxony. (The 13th child has been ignored to avoid the possible distortion of families stopping when a desired sex is reached.)
Males | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 3 | 24 | 104 | 286 | 670 | 1033 | 1343 | 1112 | 829 | 478 | 181 | 45 | 7 |
Assuming independence and that each child has the same probability of being male, \(\pi\), this would be a random sample from a \(\BinomDistn(n=12, \; \pi)\) distribution.
Maximum likelihood gives
\[ \hat{\pi} \;=\; \frac {\overline{X}}{12} \;=\; 0.5192\]The table below compares the sample proportions to the best-fitting binomial ones.
Number of males, \(x\) |
Sample proportion |
Binomial probability, \(p(x)\) |
---|---|---|
0 | 0.0005 | 0.0002 |
1 | 0.0039 | 0.0020 |
2 | 0.0170 | 0.0117 |
3 | 0.0468 | 0.0423 |
4 | 0.1096 | 0.1027 |
5 | 0.1689 | 0.1775 |
6 | 0.2196 | 0.2236 |
7 | 0.1818 | 0.2070 |
8 | 0.1356 | 0.1397 |
9 | 0.0782 | 0.0671 |
10 | 0.0296 | 0.0217 |
11 | 0.0074 | 0.0043 |
12 | 0.0011 | 0.0004 |
There were more families with 3 or fewer males and with 9 or more males than the binomial model would predict, indicating overdispersion.
The variance of the best binomial model is
\[ \Var(X) \;=\; 12 \times \hat{\pi} (1 - \hat{\pi}) \;=\; 2.996\]whereas the actual sample variance was 3.490, again indicating overdispersion.
This gives strong evidence that the assumptions underlying the binomial model do not hold. The most likely reason is that the probability of a child being male is not constant, but varies from family to family.