Binomial distribution

A binomial distribution is the most commonly encountered distribution that arises from a sequence of independent Bernoulli trials.

Definition

If the following conditions hold:

  1. There is a sequence of \(n\) Bernoulli trials, each with two outcomes "success" and "failure", where \(n\) is a fixed constant,
  2. The results of all Bernoulli trials are independent of each other,
  3. The probability of "success" is the same for all trials, \(P(success) = \pi\),

then the total number of successes, \(X\), has a binomial distribution with parameters n and \(\pi\).

\[ X \;\; \sim \;\; \BinomDistn(n, \pi) \]

In most practical applications, the parameter \(\pi\) is an unknown constant, but occasionally we know its value.

Number of sixes when four dice are rolled

  1. Defining a success to be a six when a die is rolled, we have a sequence of \(n = 4\) Bernoulli trials.
  2. Assuming that the dice are fairly rolled, the four values are independent of each other.
  3. The probability of getting a six is \(\pi = \frac 1 6\) for each roll.

The number of sixes, \(X\), therefore has a binomial distribution,

\[ X \;\; \sim \; \; \BinomDistn(n=4, \pi=\frac 1 6) \]

Sex of reptiles

In many reptiles, sex is partly determined by the incubation temperature of the eggs. In an experiment with 10 lizard eggs incubated at 25ºC, the number of males hatching, \(X\), has a binomial distribution since

  1. There is a fixed number, \(n = 10\), of Bernoulli trials, with a male being treated as a "success".
  2. The sexes of the different eggs are likely to be independently determined.
  3. The probabilities of all hatchlings being male are the same since they are all incubated at the same temperature, but this probability is an unknown value, \(\pi\).
\[ X \;\; \sim \; \; \BinomDistn(n=10, \pi) \]

Although we can often argue from the context that the assumptions behind the binomial distribution hold, there may be some doubt over this.

Rainy days in week

The table below shows the number of rainy days each week during a year at Balcombe, Sussex in the UK.

Number of days
in week with
measurable rain, x
Number of weeks in
year (frequency)
0
1
2
3
4
5
6
7
  4
  9
15
11
  6
  4
  2
  1
  52

We first consider whether the random variable X — the number of rainy days in a single week — might have a binomial distribution.

Fixed number of trials
If we define a 'success' to be a single day with rain, X is the number of successes in \(n = 7\) such days. The number of Bernoulli trials is therefore a fixed number, seven.
Constant probability of success
For the binomial distribution to hold, we need the probability of success to be the same for each Bernoulli trial, \(\pi\). Within any single week, this is likely to be true.
Independence
The last assumption that is required for the binomial distribution to hold is that the \(n = 7\) Bernoulli trials are independent. There is considerable doubt about independence in this example — if one day is rainy, then the following day is more likely to be rainy also.

Although the assumptions underlying the binomial distribution are unlikely to hold exactly, with short weather cycles the effect of dependence of adjacent days' weather may be slight. As an approximation, we can therefore tentatively make the assumption of independence, and model the number of rainy days in a week with a binomial distribution.

\[ X \;\; \sim \; \; \BinomDistn(n=7, \pi) \]

A further complication with the data set shown above is that the 52 different weeks are at different times in the year. Are the 52 binomial distributions underlying the 52 counts all the same? As rainfall is not strongly seasonal in the south of England, the value of \(\pi\) is likely to be at least approximately the same in each week, so the 52 values in the table are likely to be at least approximately a random sample from the same distribution,

\[ X_i \;\; \sim \; \; \BinomDistn(n=7, \pi) \quad \quad \text{for }i=1, ..., 52\]

We should however remember our doubt about the assumption of independence and examine the data later for evidence that this assumption does not hold.