Unknown standard deviation

In the examples on the previous page, the population standard deviation, σ, was a known value. Unfortunately this is rarely the case in practice, so the previous test cannot be used.

Returns from Mutual Funds

Investing in the share market can be risky for small investors since the value of individual companies can fluctuate greatly, especially over short periods of time. These risks can be reduced by buying shares in a mutual fund that spreads the investment among a wide portfolio of companies.

Different mutual funds invest in companies of different types and with different inherent risks of losing and (hopefully) gaining value. Some funds have been categorised as 'high-risk' funds and a sample of 25 of these is shown in the table below. The percentage return paid by these funds over a 3-year period (April 1997 to March 2000) is also shown. (The stock market did particularly well over this period!)

The corresponding annualised return from Federal Constant Maturity Rate Bonds over this period was 5.64%. Did the high-risk funds do any better on average than this 'safe' investment?

High-risk mutual fund Annualised 3-year return
(1997-2000)
Alliance Quasar
Alliance Tech
Amer Cent Gl Gold
Berger Sm Co Gr
Blackrock Sm Cp Gr
CGM Cap Devel
Dreyfus Aggressive Growth
Evergreen Aggressive growth A
Federated Small cap Strat A
Fidelity emerging markets
Fidelity Selects Comp
Franklin Value A
Goldman Sachs small cap val A
Hotchkiss and Wiley Small Cap
JP Morgan Sm Co
J Hancock Small cap Growth B
Kemper Samall cap equity A
MFS Emerg Gr
Montgomery Small cap R
Oakmark Sm Cap
O'Shaughnessy Crn Gr
PBHG Emerging Growth
Putnam OTC Emerg Gr
State St. Res Emer Gr A
USAA Aggressive Gr
8.76%
58.71%
-22.82%
49.02%
43.97%
13.91%
-2.89%
39.64%
17.91%
-10.55%
68.58%
-0.33%
4%
0.14%
23.87%
38.23%
26.6%
36.02%
29.51%
1.62%
28.91%
29.32%
54.43%
30.76%
49.67%

The hypotheses of interest are similar to those in the initial pages of this section,

H0 :   μ = 5.64
HA :   μ > 5.64

However we no longer know the population standard deviation, σ. The only information we have about σ comes from our sample.

Test statistic and its distribution

When the population standard deviation, σ, was a known value, we used a test statistic

which has a standard normal distribution when H0 was true.

When σ is unknown, we use a closely related test statistic,

where s is the sample standard deviation. This test statistic has greater spread than the standard normal distribution, due to the extra variability that results from estimating s, especially when the sample size n is small.

The diagram below generates random samples from a normal distribution. Click Take sample a few times to see the variability in the samples.

Click Accumulate then take about 50 random samples. Observe that the stacked dot plot of the t statistic conforms reasonably with a standard normal distribution.

Now use the pop-up menu to reduce the sample size to 5 and take a further 50-100 samples. You will probably notice that there are more 'extreme' t-values (less than -3 or more than +3) than would be expected from a standard normal distribution.

Reduce the sample size to 3 and repeat. It should now be clearer that the distribution of the t-statistic has greater spread than a standard normal distribution. Click on the crosses for the most extreme t-values and observe that they correspond to samples in which the 3 data values happen to be close together, resulting in a small sample standard deviation, s.

The t distribution

We have seen that the t statistic does not have a standard normal distribution, but it does have another standard distribution called a t distribution with (n - 1) degrees of freedom. In the next page, we will use this distribution to obtain the p-value for hypothesis tests.

The diagram below shows the shape of the t distribution for various different values of the degrees of freedom.

Drag the slider to see how the shape of the t distribution depends on the degrees of freedom. Note that


A standard normal distribution can be used as an approximation to a t distribution if the degrees of freedom are large (say 30 or more) but the t distribution must be used for smaller degrees of freedom.