Mean for discrete data
The following frequency table describes the household sizes in a sample of 600 households.
Household size x |
Frequency ƒx |
||
|
|
||
total | 600 |
---|
The mean household size is found by adding the sizes of all 600 households then dividing by 600,
\[\begin{aligned} \overline{x} = \frac {\sum x} n & = \frac {\overbrace{1 + 1 + ... + 1}^{140} \; + \; \overbrace{2 + 2 + ... + 2}^{180} \; + \; \overbrace{3 + 3 + ... + 3}^{60} \; + \; ...} {600} \\ & = \frac {140 \times 1 \; + \; 180 \times 2 \; + \; 60 \times 3 \; + \; ...} {600} \\ & = \frac {140} {600} \times 1 \; + \; \frac {180} {600} \times 2 \; + \; \frac {60} {600} \times 3 \; + \; ... \\ & = \sum_{x=1}^7 {x \times \text{Propn}(x)} \\ & = 2.933 \end{aligned} \]Mean for discrete random variables
If one of these 600 households was chosen at random, the probabilities of getting a household of each size would be the proportions from the data set. These could also be treated as estimates of the probabilities for the size of a new household.
We therefore define the mean of this discrete random variable to be the same value, but with probabilities replacing proportions.
\[ \mu \;=\; \sum_{x=1}^7 {x \times p(x)} \]By convention, the Greek letter \(\mu\) is often used to denote a random variable's mean.
Definition
The mean of a discrete random variable, \(X\), is defined to be
\[ E[X] = \mu = \sum_{\text{all } x} {x \times p(x)} \]Each possible value is multiplied by its probability, so values with greater probability of being observed have higher 'weights' in the formula, pulling the distribution's mean towards them.