Model for overdispersion

The Poisson distribution's variance always equals its mean. Another distribution with two parameters is needed to allow the variance to be greater than the mean.

Definition

A random variable, \(X\), is said to have a generalised negative binomial distribution

\[ X \;\;\sim\;\; \NegBinDistn(\kappa, \pi) \]

if its probability function is

\[ p(x) \;\;=\;\; \begin{cases} \displaystyle \frac{\Gamma(\kappa + x)}{x! \; \Gamma(\kappa)} \pi^{\kappa} (1-\pi)^x &\text{for }x = 0, 1, 2, \dots \\[0.2em] 0 & \text{otherwise} \end{cases} \]

where \(\kappa \gt 1\) and \(0 \le \pi \lt 1\). The two parameters \(\kappa\) and \(\pi\) are usually two unknown.

This is a generalisation of the negative binomial distribution for the number failures before the \(\kappa\)'th success in a sequence of independent success/failure trials with probability \(\pi\) of success, but we now allow non-integer values of \(\kappa\).

Mean and variance

The mean and variance of the generalised negative binomial distribution are

\[ E[X] = \frac {\kappa(1-\pi)} \pi \spaced{and} \Var(X) = \frac {\kappa(1-\pi)} {\pi^2} \]

(Proved in full version)

From these formulae,

\[ \Var(X) \;=\; E[X] \times \frac 1 {\pi} \]

Since \(\pi \lt 1\), \(\Var(X) \gt E[X]\), allowing it to model data with overdispersion.

Asymptotic distribution

If \(\kappa \to \infty\) and \(\pi \to 1\) simultaneously with \(\displaystyle \frac{\kappa(1-\pi)}{\pi} = \lambda\), the negative binomial distribution approaches a \(\PoissonDistn(\lambda)\) distribution.

(Not proved)

This shows that the negative binomial distribution can be made arbitrarily close to a Poisson distribution with appropriate choice of \(\kappa\) and \(\pi\).

Justification for the negative binomial distribution

The negative binomial distribution is often simply used as an empirical model with the flexibility to model overdispersed count data.. However it can also be theoretically derived in two different ways:

Varying \(\lambda\)
If the rate of events in a Poisson process, \(\lambda\), varies according to a particular distribution called a Gamma distribution, the number of events has a negative binomial distribution.
Clusters of events
Events sometimes arise in clusters. If the clusters arise as a Poisson process and the number of events per cluster has a log-series distribution, then the number of events has a negative binomial distribution.

Shape of distribution

We now illustrate the additional flexibility provided by the negative binomial.

The top diagram shows the distribution of the number of events in a homogeneous Poisson process in which \(E[X] = \Var(X) = 3\).

The two negative binomial distributions also have \(E[X] = 3\), but \(\Var(X) \gt 3\), so they might be used to fit over-dispersed counts with more zeros and high values than would be expected from a homogeneous Poisson process.