Inference and random samples
The examples in the previous section involved a range of different types of model for the observed data. In the remainder of this chapter, we concentrate on one particular type of model — random sampling from a population.
We assume now that the observed data are a random sample from some population.
When the observed data are a random sample, inference asks questions about characteristics of the underlying population distribution — unknown population parameters.
For random samples, the null and alternative hypotheses specify values for the unknown population parameters.
Inference about categorical populations
When the population distribution is categorical, the unknowns are the population probabilities for the different categories. To simplify, we consider populations for which one category is of particular interest ('success') and we denote the unknown probability of success by π.
The null and alternative hypotheses are therefore specified in terms of π.
Australia Post example
A Melbourne newspaper trying to assess Australia Post's assertion that 96 percent of letters arrive 'on time' posted 59 letters and observed that only 52 arrived on time.
We model delivery of these letters as a random sample of 59 categorical values from a population with probability π of success (arrival on time). The null hypothesis of interest is therefore...
H0: π = 0.96
The alternative hypothesis is
HA: π < 0.96
Design of mustard jar
A food manufacturer intends to change the design of its packaging for a range of mustards. The design team is particularly keen on a design that is more expensive to manufacture than two competing designs. The manager wants to be sure that customers will prefer the more expensive jar before starting production — the price is determined by competitors' products so the more expensive one will have a reduced profit margin and can only be justified if sales are considerably higher.
To assess whether customers prefer the more expensive mustard jar, a limited number of each of the three designs is manufactured and placed together for sale at the same price in a supermarket. Out of the first 90 jars of mustard sold, the jar that cost more to manufacture was bought 36 times.
This situation can be modelled as random sampling of 90 values (the three jar designs) from a categorical population in which the probability of picking the jar with highest production cost is π. The null hypothesis of interest is therefore...
H0: π = 1/3 (no preference)
The alternative hypothesis is
HA: π > 1/3 (preference for the expensive design)
Tests about parameters of other populations
Other data sets arise as random samples from different kinds of population. For example, numerical data sets are often modelled as random samples from a normal distribution. Again, the hypotheses of interest are usually expressed in terms of the parameters of this distribution.
For example, to test whether the mean of a normal distribution is zero, the hypotheses would be...
H0: µ = 0
HA: µ ≠ 0
In the remainder of this section, we show how to test a population probability, and in the next section we will describe tests about a population mean.