To use a binomial distribution to model the number of successes in a sequence of success/failure trials, we must assume that
If \(\pi\) varies or the results of successive trials are positively related, there is more chance of a very low or very high count than a binomial distribution would give — overdispersion.
Sex of babies
The number of male children among the first 12 children in 6,115 families of size 13, were recorded from hospital records in 19th century Saxony. If the sexes of different children were independent and each child had the same probability of being male, \(\pi\), this would be a random sample from a \(\BinomDistn(n=12, \; \pi)\) distribution.
Using maximum likelihood, \( \hat{\pi} \;=\; 0.5192\) and the table below shows the resulting binomial probabilities alongside the sample proportions.
Number of males, \(x\) |
Sample proportion |
Binomial probability, \(p(x)\) |
---|---|---|
0 | 0.0005 | 0.0002 |
1 | 0.0039 | 0.0020 |
2 | 0.0170 | 0.0117 |
3 | 0.0468 | 0.0423 |
4 | 0.1096 | 0.1027 |
5 | 0.1689 | 0.1775 |
6 | 0.2196 | 0.2236 |
7 | 0.1818 | 0.2070 |
8 | 0.1356 | 0.1397 |
9 | 0.0782 | 0.0671 |
10 | 0.0296 | 0.0217 |
11 | 0.0074 | 0.0043 |
12 | 0.0011 | 0.0004 |
There were more families with 3 or fewer males and with 9 or more males than the binomial model would predict, indicating overdispersion.
The variance of the best binomial model is
\[ \Var(X) \;=\; 12 \times \hat{\pi} (1 - \hat{\pi}) \;=\; 2.996\]whereas the actual sample variance was 3.490, again indicating overdispersion.
This gives strong evidence that the assumptions underlying the binomial model do not hold. The most likely reason is that the probability of a child being male is not constant, but varies from family to family.