What does a scatterplot tell you about a relationship?

The most important information that a scatterplot shows is the strength of the relationship between the variables.

If higher values of one variable tend to be associated with higher values of the other variable, the crosses on the scatterplot will be in a band with positive slope. The relationship is then said to be positive.

If high values of one variable tend to be associated with low values of the other variable, we say that there is a negative relationship.

Prices of second-hand cars

The example below is an artificial one and is only intended to help explain the differences between strong and weak relationships, and between positive and negative ones.

Imagine someone who collects the prices of 200 mid-sized second-hand cars in a city and also record the ages of the cars and their engine sizes. We are interested in how the price depends on car age and engine size.

The diagram initially shows the relationship between price and engine size. The bigger the engine, the higher the price — the cloud of crosses extends from bottom left to top right. This is therefore a positive relationship.

Use the slider to change the strength of the relationship. If the relationship is weak, there is little difference between the prices of cars with small engines and of those with larger engines. (Click in the left of the scatterplot to see the range of prices for cars with small engines, then drag to the right to see the prices of those with larger engines.) If the relationship is strong, you will be able to predict price fairly accurately from engine size.

Now select Age (years) from the pop-up menu. The relationship between price and age is negative — the older the car, the lower the price. The slider can again be used to adjust the strength of the relationship.