'Missing' responses
The first two types of non-sampling error are caused by failure to obtain information from some members of the target population.
Coverage error
Coverage error occurs when the sample is not selected from the target population, but from only part of the target population. As a result, the estimates that are obtained do not describe the whole target population — only a subgroup of it.
A researcher is interested in irrigation practice amoung wheat growers in a region. There is no database containing names and addresses of all farmers growing wheat, so questionnaires are sent to members of a local wheat-growers association. Depending on the number and characteristics of farmers who grow wheat but are not members of the association, there is potential for considerable coverage error.
An ecologist studying the wildlife in a reserve collects daily the carcasses of possums that were killed on the road that runs through the reserve. Various physiological measurements were made from each of the 20 possums that were obtained in this manner.
There is potential for considerable coverage error if the possums that venture onto the road are not 'typical' of all possums in the reserve.
Non-response error
In many surveys, some selected individuals do not respond. This may be caused by ...
If non-response is related to the questions being asked, estimates from the survey are likely to be biased.
A survey is conducted to assess the number of times that residents in a city visited their doctor in the previous year. Phone numbers are randomly selected from a telephone directory and these numbers are phoned in weekday evenings.
People who are not at home (and therefore do not respond) are more likely to be healthy than those who do respond, so the sample responding will tend to overestimate the average number of doctor visits per resident. The estimate would therefore be biased.
There are several other flaws in this survey that introduce further non-sampling errors. In particular, there is also coverage error since residents whose numbers are not listed in the telephone directory cannot be sampled.
Real example
In the 1936 American presidential election, there were two candidates, Roosevelt and Landon. The Literary Digest conducted a poll, aiming to predict the result of the election; its procedure was to mail questionnaires to 10 million Americans (using names from telephone books and club membership). From the 2.4 million replies, it made the following prediction:
Landon | Roosevelt | |
---|---|---|
Literary Digest's prediction | 57 | 43 |
Actual result | 38 | 62 |
Despite the large sample size (and resulting small sampling error), the non-sampling errors were extremely large in the poll.
The group who responded would have different characteristics from the whole population, hence the large difference between the Literary Digest prediction and the actual election result.
Incidentally, another pollster, George Gallup, also conducted a survey before this election. Although he only sampled 50,000 people, he put more effort into making his sample representative. His poll predicted that Roosevelt would win the election with 56 percent of the vote, much closer to the actual result.