Weak relationships

Strong linear relationships always correspond to correlation coefficients of +1 or –1. Weak relationships usually (but not always) result in correlation coefficients near zero.

Correlation of independent variables

If two random variables, \(X\) and \(Y\), are independent

\[ \Corr(X,Y) \;=\; 0 \]

The proof is based on the earlier result that

\[ \Covar(X, Y) \;=\; E[XY] - E[X]E[Y] \]

For independent discrete random variables,

\[ \begin{align} E[XY] \;&=\; \sum_{\text{all }x} \sum_{\text{all }y} {xy \; p(x,y)} \\ &=\; \sum_{\text{all }x} \sum_{\text{all }y} {xy \; p_X(x)p_Y(y)} \\ &=\; \sum_{\text{all }x} {x\;p_X(x)} \times \sum_{\text{all }y} {y\;p_Y(y)} \\ &=\; E[X] E[Y] \end{align} \]

A similar results holds for continuous random variables, with summation replaced by integration.

This proves that the covariance of \(X\) and \(Y\) is zero, so their correlation is zero too.

It should be noted that we have not proved that two variables must be independent if their correlation is zero. It is possible to define joint distributions in which the variables are strongly related but their correlation is zero.

Nonlinear relationship

Consider a discrete random variable \(X\) whose distribution is symmetric around zero. We now define a second random variable as \(Y = X^2\). The variables \(X\) and \(Y\) are strongly related — knowing the value of \(X\) tells you the exact value of \(Y\) — but

\[ \Corr(X,Y) \;=\; 0 \]
\[ \Covar(X, Y) \;=\; E[XY] \;=\; E[X^3] \;=\; \sum_{\text{all }x} x^3 p_X(x) \]

Now \(x^3 p_X(x) = -(-x)^3 p_X(-x)\) since the distribution is symmetric around zero. Each positive value in the summation (at value \(x\)) is therefore cancelled by a negative value of the same magnitude (at value \(-x\)). As a result, the covariance between \(X\) and \(Y\) is therefore zero and the variables are uncorrelated.

It is important to remember that:

Independent variables have zero correlation, but variables with zero correlation are not necessarily independent.