Correlation

The correlation coefficient between two random variables is closely associated with their covariance.

Definition

The correlation coefficient between two random variables, \(X\) and \(Y\), is

\[ \Corr(X,Y) \;=\; \frac{\Covar(X,Y)}{\sqrt{\Var(X)\Var(Y)}} \]

The correlation coefficient between two variables is often denoted by the Greek letter \(\rho\).

The correlation coefficient summarises the strength of the relationship between the variables in a way that is not affected by linear scaling, as shown by the following result.

Correlation of linear functions of X and Y

For any random variables, \(X\) and \(Y\), and constants \(a\), \(b\), \(c\) and \(d\),

\[ \Corr(a + bX, c+dY) \;=\; \begin{cases} \Corr(X, Y) & \quad\text{if }bd > 0 \\[0.3em] -\Corr(X, Y) & \quad\text{if }bd > 0 \end{cases} \]

The proof follows from writing

\[ \begin{align} \Covar(a + bX, c+dY) \;&=\; bd \Covar(X, Y) \\ \Var(a + bX) \;&=\; b^2 \Var(X) \\ \Var(c + dY) \;&=\; d^2 \Var(Y) \end{align} \]