Multivariate and bivariate data

Many datasets contain several measurements from each individual (or plant, item or other unit). For example, a market researcher might conduct a survey of households to determine the factors affecting readership of magazines. Measurements from each individual would include expenditure on magazines in the previous week, income, age, years of education and several other characteristics. Each measurement type is called a variable.

A data set with two variables is called bivariate. We initially restrict attention to bivariate numerical data.

Body fat

Percentage body fat of individuals is an important measure of their health, but is a difficult quantity to measure. Accurate measurement of a person's body fat involves weighing the individual submersed in water, so is rarely done in fitness checks.

Scientists recorded the percentage body fat of a group of 252 men, plus several other easily obtained measurements. Body fat and various other measurements are displayed in the table below.

The dot plot shows the distribution of body fat — it clearly varies considerably from person to person.

One value stands out as a possible outlier — one man had 45.1% body fat. Could this be a measurement or transcription error?

Click on the cross for this individual to see his other measurements in the list. His weight, chest and abdomen measurements are also large but he is short, so this is probably not a measurement error.

Also check whether the lowest value (0% body fat) seems correct in view of this person's other measurements.