Sum and mean of normal random sample
On the previous page, we gave formulae for the mean and variance of the sum of a random sample, \(\{X_1, X_2, ..., X_n\}\), from a distribution with mean \(\mu\) and variance \(\sigma^2\).
\[ E\left[\sum_{i=1}^n {X_i}\right] = n\mu \spaced{and} \Var\left(\sum_{i=1}^n {X_i}\right) = n\sigma^2 \]On that page, we also gave formulae for the mean and variance of the distribution of the sample mean,
\[ E\left[\overline {X} \right] = \mu \spaced{and} \Var\left(\overline {X}\right) = \frac {\sigma^2} n \]Although these formulae show the centre and spread of the distributions, they do not describe their shapes. If we know that the random sample comes from a normal distribution, we can do better.
Sum and mean of a normal random sample
If \(\{X_1, X_2, ..., X_n\}\) is a random sample of \(n\) values from a \(\NormalDistn(\mu, \;\sigma^2)\) distribution,
\[\begin{aligned} \sum_{i=1}^n {X_i} & \;\; \sim \;\; \NormalDistn(n\mu, \;\sigma_{\Sigma X}^2=n\sigma^2) \\ \overline{X} & \;\; \sim \; \; \NormalDistn(\mu, \;\sigma_{\overline X}^2=\frac {\sigma^2} n) \end{aligned} \]Indeed, it can be proved that any linear combination of independent normal random variables also has a normal distribution. (The proofs of these results are too difficult to show here.)
Samples from other distributions
Although the sum and mean of a random sample can only be proved to have normal distributions when the sample comes from a normal distribution, the following theorem shows that their distributions are approximately normal when the sample size is large, whatever the distribution that is being sampled from.
Central Limit Theorem (informal)
If \(\{X_1, X_2, ..., X_n\}\) is a random sample of \(n\) values from any distribution with mean \(\mu\) and variance \(\sigma^2\),
\[\begin{aligned} \sum_{i=1}^n {X_i} & \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \NormalDistn(n\mu, \;\;\sigma_{\Sigma X}^2=n\sigma^2) \\ \overline{X} & \;\; \xrightarrow[n \rightarrow \infty]{} \; \; \NormalDistn(\mu, \;\;\sigma_{\overline X}^2=\frac {\sigma^2} n) \end{aligned} \]This formulation of the Central Limit Theorem shows how it is interpreted in practice. However it is not strictly correct since the normal distributions on the right involve \(n\) which tends to infinity in the limit. A more precise statement converts the sum (and mean) into variables called "z-scores" first, by subtracting their mean, then dividing by their standard deviation.
Central Limit Theorem (more precise)
If \(\{X_1, X_2, ..., X_n\}\) is a random sample of \(n\) values from any distribution with mean \(\mu\) and variance \(\sigma^2\),
\[ Z_n = \frac {\sum_{i=1}^n {X_i} - n\mu} {\sqrt{n}\; \sigma} \quad \xrightarrow[n \rightarrow \infty]{} \quad \NormalDistn(0,\; 1) \]Importance of the normal distribution
The Central Limit Theorem is the main reason why the normal distribution is so important in statistics. Sample means are approximately normal, whatever the distribution from which the values are sampled.
Other related results show that many other summary statistics from random samples also have approximately normal distributions.