Model for overdispersion

The Poisson distribution only has a single parameter, so its variance cannot be increased without simultaneously increasing the distribution's mean.

Definition

A random variable, \(X\), is said to have a generalised negative binomial distribution

\[ X \;\;\sim\;\; \NegBinDistn(\kappa, \pi) \]

if its probability function is

\[ p(x) \;\;=\;\; \begin{cases} \displaystyle \frac{\Gamma(\kappa + x)}{x! \; \Gamma(\kappa)} \pi^{\kappa} (1-\pi)^x &\text{for }x = 0, 1, 2, \dots \\[0.2em] 0 & \text{otherwise} \end{cases} \]

where \(\kappa \gt 1\) and \(0 \le \pi \lt 1\). The two parameters \(\kappa\) and \(\pi\) are usually two unknown.

This is a generalisation of the negative binomial distribution that was described earlier for the number failures before the \(\kappa\)'th success in a sequence of independent success/failure trials with probability \(\pi\) of success, but we now allow non-integer values of \(\kappa\). It is often simply called a negative binomial distribution.

(To see that the distribution is the same as the earlier negative binomial distribution, notice that \(\kappa\) was an integer, allowing the gamma functions to be replaced by factorials.)

Properties

We now describe some properties of the negative binomial distribution that explain why it works as a model for overdispersed count data.

Mean and variance

The mean and variance of the generalised negative binomial distribution are

\[ E[X] = \frac {\kappa(1-\pi)} \pi \spaced{and} \Var(X) = \frac {\kappa(1-\pi)} {\pi^2} \]

These are the same formulae that were derived earlier for integer \(\kappa\). We will not give the more general proof here.

From these formulae,

\[ \Var(X) \;=\; E[X] \times \frac 1 {\pi} \]

Since \(\pi\) is less than 1, the variance of the distribution is always greater than its mean, a characteristic of overdispersion.

The next result is also stated without proof.

Asymptotic distribution

If \(\kappa \to \infty\) and \(\pi \to 1\) simultaneously with \(\displaystyle \frac{\kappa(1-\pi)}{\pi} = \lambda\), the negative binomial distribution approaches a \(\PoissonDistn(\lambda)\) distribution.

This shows that the negative binomial distribution can be made arbitrarily close to a Poisson distribution with appropriate choice of \(\kappa\) and \(\pi\).

Justification for the negative binomial distribution

We will mainly treat the negative binomial distribution simply as a more flexible model for counts that can be used to model overdispersion. There are however two ways that a negative binomial distribution can be theoretically derived. (The details are complex and we will simply give a flavour here.)

Varying \(\lambda\)
If the rate of events in a Poisson process, \(\lambda\), varies according to a particular distribution called a Gamma distribution, the number of events observed can be proved to have a negative binomial distribution.
Clusters of events
Events sometimes arise in clusters. If the clusters arise as a Poisson process and the number of events per cluster has a log-series distribution, then the number of events has a negative binomial distribution. (For example, if houses were randomly distributed over an area as a Poisson process, and the number of people in any house had a log-series distribution, the total number of people in an area would have a negative binomial distribution.)

We can however simply use the distribution as an empirical model with the flexibility to model overdispersed count data.

Shape of generalised negative binomial distribution

The diagram below initially shows the probability function for a Poisson distribution; the top slider can be used to adjust its mean, \(\lambda\).

For any Poisson distribution, the mean \(E[X]\) is equal to the variance \(\Var(X)\). The generalised negative binomial has a second parameter that allows more flexibility. Drag the bottom slider to make the variance a greater multiple of the mean. Observe that whatever the distribution's mean,

Clicking Show zero-one axis to deselect it makes the bar charts fill up more of the vertical height in the diagram, making these features clearer.