Distributions
When an abstract population is imagined to underlie a data set, it often contains an infinite number of values. For example, consider the lifetimes of a sample of light bulbs. The population of possible failure times contains all values greater than zero, and this includes an infinite number of values. Moreover, some of these possible values will be more likely than others.
This kind of underlying population is called a distribution.
The notion of sampling from an infinite population is difficult, so we will now illustrate it in a different context as an extension of sampling from a finite population.
Location of cows in a field
Consider a cow that can freely move within a field. We observe its location in the field at six times so our data are six 'locations' for the cow.
Initially consider the field being split into a 5x5 grid giving a population of 25 possible locations for the cow. The six positions at which the cow was observed are a random sample of 6 from this population. Click Take sample a few times to see possible locations using this model.
Use the pop-up menu to change the grid to a 10x10 grid and then a 30x30 grid to allow a finer specification of the cow locations. In both cases, we are still selecting samples (with replacement) from a finite population.
Finally select Infinite from the pop-up menu to continue this refinement of the grid to its extreme, allowing the cow locations to be anywhere within the field — an infinite population. Clicking Select sample selects a random sample of locations from this infinite population.
In the illustration above, we assumed that all possible locations in the field were equally likely. The idea of a distribution also allows for some possible values to be more likely than others. For example, the cow may be more likely to be in some parts of the field above than others.