Definition of the correlation coefficient
The correlation coefficient is usually defined by the formula
It is however easier to understand in an alternative form. We avoid dependence on the units of measurement of X and Y, by expressing the correlation coefficient in terms of the z-scores for the two variables.
standardised X, | ![]() |
standardised Y, | ![]() |
---|
In terms of them,
where n is the number of individuals from whom values of X and Y have been recorded.
The correlation coefficient is a kind of average of the products of the z-scores.
(The divisor (n - 1) is used instead of n since it is used in the formulae for the standard deviations of X and Y.)
How does r relate to the shape of a scatterplot?
The following properties of r explain in general terms how its value is related to the strength of a relationship in any particular scatterplot. You will not be able to interpret a correlation coefficient unless you know these properties.
|
![]() |
|
![]() |
|
![]() |
|
![]() ![]() |
|
−1 ≤ r ≤ +1 |
Although it is difficult to prove some of these properties, it is possible to get a feel for how the value of r reflects the strength of a relationship from its definition. The diagram below explains.
Strength of the relationship and r
The diagram below shows a scatterplot of the z-scores for X and Y, denoted by z(x) and z(y) in the diagram.
The shading in the diagram represents the product of z(x) and z(y) — it is blue when the product is positive and red when the product is negative. The deeper the shade, the further z(x)*z(y) from zero.
The correlation coefficient, r, is a kind of average of these products, z(x) * z(y).
Use the slider to show data with different correlation coefficients. Observe that
The diagram is actually 3-dimensional. Click the button on the top right to rotate it (or drag the centre of the diagram with the mouse towards the top left). The coloured surface shows the product z(x)*z(y) for all z(x) and z(y). Adjust the correlation coefficient again with the slider.