Outliers and incorrect measurements
Values that are considerably larger or smaller than the bulk of the data are called outliers.
Detection of outliers is particularly important. An outlier may have been incorrectly recorded, or there may have been other anomalous circumstances associated with it. Outliers must be carefully checked if possible. If anything atypical can be found, outliers should be deleted from the data set and their deletion noted in any reports about the data.
FTSE 100 Share Price Changes
Share price changes are monitored carefully by investment managers. The diagram below shows the percentage change in the price of all shares that were part of the FTSE 100 stock exchange index in London during 11th April 2002.
The stacked dot plot shows one outlier. The share price of Mmo2, a major British mobile phone operator, dropped by over 11 percent during that day. A financial analyst would look carefully at this company to find what caused this drop.
Drag over the other crosses to identify the corresponding company names. Do you see anything in common between the companies whose share prices dropped sharply? Three out of the next four companies that performed poorly are also major players in the telecommunications area (Vodafone, Cable & Wireless and British Telecom). However the performance of Mmo2 is still extreme enough for it to be classified as an outlier.
Outliers and skew distributions
An extreme data value that stands out from the rest of the data does not necessarily indicate that there is a mistake in the data or something unusual about the individual. Our interpretation of the extreme value should also take into account the shape of the distribution of values for the rest of the data.
Storm duration
The stem and leaf plot below shows the durations (in minutes) of the first 50 storms in the 1983/4 rainy season in the Bvumbwe catchment in Malawi.
One storm lasted much longer than the others (880 minutes). It is certainly worth checking the records for this storm (was the duration perhaps really 88 minutes?). However the value is not necessarily a mistake.
Most storms are short, with durations less than 100 minutes, so the longest rows of leaves are at the bottom of the stem and leaf plot. There are fewer storms lasting 100-200 minutes, fewer still of 200-300 minutes and this pattern continues, with the frequency of storms decreasing steadily up the stem and leaf plot. This shape of distribution is called a skew distribution, as opposed to a symmetric distribution whose tails decrease at similar speed on both sides of the peak density.
Perhaps this 'outlier' is a continuation of the pattern into the tail of the distribution and is just a long storm that could be expected once every hundred or so storms.