Mean for discrete data

The following frequency table describes the household sizes in a sample of 600 households.

Household size
x
Frequency
ƒx
1
2
3
4
5
6
7
140
180
60
100
60
40
20
total 600

The mean household size is found by adding the sizes of all 600 households then dividing by 600,

\[\begin{aligned} \overline{x} = \frac {\sum x} n & = \frac {\overbrace{1 + 1 + ... + 1}^{140} \; + \; \overbrace{2 + 2 + ... + 2}^{180} \; + \; \overbrace{3 + 3 + ... + 3}^{60} \; + \; ...} {600} \\ & = \frac {140 \times 1 \; + \; 180 \times 2 \; + \; 60 \times 3 \; + \; ...} {600} \\ & = \frac {140} {600} \times 1 \; + \; \frac {180} {600} \times 2 \; + \; \frac {60} {600} \times 3 \; + \; ... \\ & = \sum_{x=1}^7 {x \times \text{Propn}(x)} \\ & = 2.933 \end{aligned} \]

Mean for discrete random variables

If one of these 600 households was chosen at random, the probabilities of getting a household of each size would be the proportions from the data set. These could also be treated as estimates of the probabilities for the size of a new household.

We therefore define the mean of this discrete random variable to be the same value, but with probabilities replacing proportions.

\[ \mu \;=\; \sum_{x=1}^7 {x \times p(x)} \]

By convention, the Greek letter \(\mu\) is often used to denote a random variable's mean.

Definition

The mean of a discrete random variable, \(X\), is defined to be

\[ E[X] = \mu = \sum_{\text{all } x} {x \times p(x)} \]

Each possible value is multiplied by its probability, so values with greater probability of being observed have higher 'weights' in the formula, pulling the distribution's mean towards them.