Finding an appropriate distribution

Thinking about how a measurement is made sometimes suggests that the variable's distribution should belong to a family of standard distributions. Examples of such families of distributions are uniform, binomial, geometric and negative binomial distributions.

This reasoning may require some assumptions about the process underlying the variable.

Rolling a die

A fair six-sided die is rolled and \(X\) denotes the number on top. We can argue that the values \({1, 2, 3, 4, 5, 6}\) are equally likely, so

\[ X \;\; \sim \; \; \UniformDistn(1, 6) \]

Number of males in family of 4 children

If we assume that the sexes of all children in a family of four are independently determined and all have probability \(\pi\) of being male, the number of males in a family of four children, \(X\), will be

\[ X \;\; \sim \; \; \BinomDistn(n=4,\; \pi) \]

If we were to make the further assumption that male and female children were equally likely, we might refine this to:

\[ X \;\; \sim \; \; \BinomDistn(n=4,\; \pi = 0.5) \]

Usage until failure of a light bulb

If we can assume that a light bulb only fails when it is switched on, and that there is the same probability of failure, \(\pi\), each time it is switched, then the number of times it can be switched on and off until it fails, \(X\), is:

\[ X \;\; \sim \; \; \GeomDistn(\pi) \]

If there were two spare bulbs with a failing bulb being immediately replaced if a spare remained, and the random variable of interest was the number of switches until the spares ran out, \(Y\), then

\[ Y \;\; \sim \; \; \NegBinDistn(k=3, \pi) \]

Weight of a newborn female baby

The weight of a newborn baby girl, \(X\), is a continuous random variable but we cannot reason about its likely distribution. However we might believe that baby weights will have a reasonably symmetric distribution and be willing to make the assumption that the distribution will be approximately normal, leading to a model of the form

\[ X \;\; \sim \; \; \NormalDistn(\mu, \sigma^2) \]

for the variable. Although we are just guessing the shape of the distribution and should check the distribution's shape later, this may be a reasonable tentative model for the variable.

Unfortunately, reasoning and assumptions can usually only lead to a family of standard distributions with one or more parameters whose values are unknown, such as the probability \(\pi\) in some of the examples above.

How can we find the value of any such unknown parameter?