Test statistic
When testing the value of a probability, π, the obvious statistic to use from our random sample is the corresponding sample proportion, p.
It is however more convenient to use the number of successes, x, rather than p since we know that X has a binomial distribution with parameters n (the sample size) and π.
X ~ binomial (n , π)
When we know the distribution of the test statistic (at least after the null hypothesis has fixed the value of the parameters of interest), it becomes much easier to obtain the p-value for the test.
P-value
As in all other tests, the p-value is the probability of getting such an 'extreme' set of data if the null hypothesis is true. Depending on the null and alternative hypotheses, the p-value is therefore the probability that X is as big (or sometimes as small) as the recorded value.
Since we know the binomial distribution of X when the null hypothesis holds, the p-value can therefore be obtained by adding binomial probabilities.
The p-value is a sum of binomial probabilities
Note that the p-value can be obtained exactly without need for simulations or randomisation.
Scaring whales from fishing boats
Whales have probability 0.4 of immediately leaving when encountering boats that are fishing in the North Atlantic. When a device emitting the sound of a killer whale is used, 15 out of 30 whales stay. Is the device effective?
H0: π = 0.4
HA: π > 0.4
In the diagram below, click Accumulate then hold down Simulate until about 100 samples of 30 whales have been generated. The proportion of these simulated samples in which 15 or more leave is an approximation to the p-value for the test.
Since we know that the number leaving has a binomial (30, 0.4) distribution when the null hypothesis holds, the simulation is unnecessary. Select Binomial distribution from the pop-up menu. This binomial distribution is displayed, and the probability of 15 or more whales leaving is shown to be 0.1754 — the p-value for the test.
Since the p-value is not small, the observed data could easily have arisen even if the device was ineffective. We therefore conclude that there is no evidence that the device is effective. Note that this can be done without any simulations.