Binomial distribution

The binomial distribution also requires certain assumptions about a sequence of success/failure trials; these do not always hold. In particular, we assume that

If the probability of success varies or the results of successive trials are positively related, the probability of a very low or very high count will be greater than a binomial distribution would give — overdispersion.

Sex of babies

The following table gives the number of male children among the first 12 children in 6,115 families of size 13, taken from hospital records in 19th century Saxony. (The 13th child has been ignored to avoid the possible distortion of families stopping when a desired sex is reached.)

Males 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 3 24 104 286 670 1033 1343 1112 829 478 181 45 7

Assuming independence and that each child has the same probability of being male, \(\pi\), this would be a random sample from a \(\BinomDistn(n=12, \; \pi)\) distribution.

Maximum likelihood gives

\[ \hat{\pi} \;=\; \frac {\overline{X}}{12} \;=\; 0.5192\]

The table below compares the sample proportions to the best-fitting binomial ones.

  Number of  
males, \(x\)
Sample
  proportion  
Binomial
  probability, \(p(x)\)  
00.00050.0002
10.00390.0020
20.01700.0117
30.04680.0423
40.10960.1027
50.16890.1775
60.21960.2236
70.18180.2070
80.13560.1397
90.07820.0671
100.02960.0217
110.00740.0043
120.00110.0004

There were more families with 3 or fewer males and with 9 or more males than the binomial model would predict, indicating overdispersion.

The variance of the best binomial model is

\[ \Var(X) \;=\; 12 \times \hat{\pi} (1 - \hat{\pi}) \;=\; 2.996\]

whereas the actual sample variance was 3.490, again indicating overdispersion.


This gives strong evidence that the assumptions underlying the binomial model do not hold. The most likely reason is that the probability of a child being male is not constant, but varies from family to family.