This approach can be applied to test whether a discrete data set of \(n\) values is a random sample from any distribution.

  1. Estimate the model's \(p\) unknown parameters.
  2. The frequencies in a frequency table are our observed counts, \(\{O_x\}\).
  3. Use the model's probability function (with estimated parameters) to get probabilities for the table cells.
  4. The expected counts, \(\{E_x\}\), are these probabilities times \(n\).
  5. Combine cells in the frequency table to avoid small expected counts.
  6. The test statistic is,
\[ X^2 \;=\; \sum_{x} {\frac{\left(O_x - E_x\right)^2}{E_x}} \]
  1. The number of 'constraints' is \(c = (p+1)\), the last one being because \(\sum{E_i} \;=\; \sum{O_i}\). The degrees of freedom are the number of combined counts minus \(c\).
  2. The p-value is the upper tail of the chi-squared distribution with this number of degrees of freedom.
  3. Interpret the p-value — small values give evidence that the data do not fit the distribution.

Example

The following table gives the number of male children among the first 12 children in 6,115 families of size 13, taken from hospital records in 19th century Saxony. (The 13th child has been ignored to avoid the possible distortion of families stopping when a desired sex is reached.)

Males 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 3 24 104 286 670 1033 1343 1112 829 478 181 45 7

Assuming independence and that each child has the same probability of being male, \(\pi\), this would be a random sample from a \(\BinomDistn(n=12, \; \pi)\) distribution.

Is there evidence that the probability of a birth being male differs from family to family?

(Solved in full version)