Approach
The null and alternative hypotheses are treated differently in statistical hypothesis testing. We compare them by asking ...
Are the data consistent with the null hypothesis?
It is extremely important that you understand that hypothesis tests address this question — make sure that you remember it well!!
P-value
We approach a hypothesis test by evaluating a p-value. This is the probability of getting a value of the test statistic as "extreme" as the one calculated from the actual data set, assuming that the null hypothesis holds.
Definition
For a hypothesis test using a test statistic \(T\), if the values of \(T\) that favour the alternative hypothesis more than the observed value of the test statistic, \(t\), are the set \(A\), the p-value for the test is the probability of such a value when the null hypothesis holds,
\[ \text{p-value} \;\;=\;\; P(T \in A \mid H_0) \]For example, if large values of the test statistic, \(T\), would favour the alternative hypothesis and it is evaluated to be \(t\) from the recorded data, the p-value is
\[ \text{p-value} \;\;=\;\; P(T \ge t \mid H_0) \]Since we know the distribution of the test statistic when the null hypothesis holds, the p-value can always be evaluated.
Interpretation
The p-value gives the probability of getting as "extreme" a value of the test statistic as the one observed when the null hypothesis holds. If this is small, we can argue that our data would have been unlikely if the null hypothesis was true, so the alternative hypothesis is more likely to be correct.
The following table may be regarded as an oversimplification, but can be used as a guide to interpreting p-values.
p-value | Interpretation |
---|---|
over 0.1 | no evidence that the null hypothesis does not hold |
between 0.05 and 0.1 | very weak evidence that the null hypothesis does not hold |
between 0.01 and 0.05 | moderately strong evidence that the null hypothesis does not hold |
under 0.01 | strong evidence that the null hypothesis does not hold |
Two examples show how this works.
Example: Telepathy experiment
In the telepathy experiment, the number of correctly picked card shapes, \(X\), had a \(\BinomDistn(n=90, \pi)\) distribution. Our hypotheses were
When the experiment was conducted, 36 out of the 90 card shapes were correctly chosen. If the null hypothesis holds, \(X\) has the \(\BinomDistn(n=90, \diagfrac {\small 1} {\small 3})\) distribution shown below.
Values of the test statistic that would support telepathy at least as strongly as the observed number would be \(A = \{\text{values } x \ge 36\}\). Drag the slider in the diagram to read off this value,
\[ \text{p-value} \;\;=\;\; P(X \ge 36 \mid \pi=\diagfrac {\small 1} {\small 3}) \;\;=\;\; 0.1103 \]There would be an 11% chance of getting 36 correct card shapes when guessing so the data would not be particularly unusual for subjects who were guessing. We would therefore conclude that the data are consistent with the guessing (the null hypothesis) and there is no evidence for telepathy (the alternative hypothesis).
Example: Aircraft air-conditioner failures
For the aircraft air-conditioner failure data, our hypotheses were
We identified the sample total of the times to failure as a test statistic. If the null hypothesis was true,
\[ \sum_{i=1}^{199}{X_i} \;\;\sim\;\; \ErlangDistn(199, \lambda=\diagfrac {\small 1} {\small 110}) \]For the actual data, the sample total time was \(\sum{X_i} = 18,093\) hours, so the p-value for the test is the probability of as low a total time as this (when assuming that the manufacturer's claim of \(\lambda = \diagfrac {\small 1} {\small 110}\) is correct),
\[ \text{p-value} \;\;=\;\; P\left(\sum{X_i} \le 18,093 \mid \lambda=\diagfrac {\small 1} {\small 110}\right) \]This can be evaluated using the Excel function
= GAMMA.DIST(18093, 199,110, TRUE)
to be 0.0049.
This p-value means that there would be under 0.5% probability that the total of the times between the 199 failures would be as low as was observed if the manufacturer's claim was correct. The observed data would have been very unlikely if the manufacturer's claim was correct (the null hypothesis), so we conclude that there is strong evidence that the claim is wrong (the alternative hypothesis).