5.3.1 Covariance and Correlation

Consider two random variables $X$ and $Y$. Here, we define the covariance between $X$ and $Y$, written $\textrm{Cov}(X,Y)$. The covariance gives some information about how $X$ and $Y$ are statistically related. Let us provide the definition, then discuss the properties and applications of covariance.
The covariance between $X$ and $Y$ is defined as \begin{align}%\label{} \nonumber \textrm{Cov}(X,Y)&=E\big[(X-EX)(Y-EY)\big]=E[XY]-(EX)(EY). \end{align}
Note that \begin{align}%\label{} \nonumber E\big[(X-EX)(Y-EY)\big]&=E\big[XY-X(EY)-(EX)Y+(EX)(EY)\big]\\ \nonumber &=E[XY]-(EX)(EY)-(EX)(EY)+(EX)(EY)\\ \nonumber &=E[XY]-(EX)(EY). \end{align} Intuitively, the covariance between $X$ and $Y$ indicates how the values of $X$ and $Y$ move relative to each other. If large values of $X$ tend to happen with large values of $Y$, then $(X-EX)(Y-EY)$ is positive on average. In this case, the covariance is positive and we say $X$ and $Y$ are positively correlated. On the other hand, if $X$ tends to be small when $Y$ is large, then $(X-EX)(Y-EY)$ is negative on average. In this case, the covariance is negative and we say $X$ and $Y$ are negatively correlated.

Example
Suppose $X \sim Uniform(1,2)$, and given $X=x$, $Y$ is exponential with parameter $\lambda=x$. Find $\textrm{Cov}(X,Y)$.
  • Solution
    • We can use Cov$(X,Y)=EXY-EXEY$. We have $EX=\frac{3}{2}$ and \begin{align}%\label{} \nonumber EY &=E[E[Y|X]] &\big(\textrm{law of iterated expectations (Equation 5.17)}\big)\\ \nonumber &=E\left[\frac{1}{X}\right] &\big(\textrm{since }Y|X \sim Exponential(X)\big)\\ \nonumber &=\int_{1}^{2} \frac{1}{x} dx\\ \nonumber &=\ln 2. \end{align} We also have \begin{align}%\label{} \nonumber EXY &=E[E[XY|X]] &\big(\textrm{law of iterated expectations}\big)\\ \nonumber EXY &=E[XE[Y|X]] &\big(\textrm{since} E[X|X=x]=x\big)\\ \nonumber &=E\left[X\frac{1}{X}\right] &\big(\textrm{since }Y|X \sim Exponential(X)\big)\\ \nonumber &=1. \end{align} Thus, \begin{align}%\label{} \nonumber \textrm{Cov}(X,Y)=E[XY]-(EX)(EY)=1-\frac{3}{2} \ln 2. \end{align}


Now we discuss the properties of covariance.
Lemma
The covariance has the following properties:
  1. $\textrm{Cov}(X,X)=\textrm{Var}(X)$;
  2. if $X$ and $Y$ are independent then $\textrm{Cov}(X,Y)=0$;
  3. $\textrm{Cov}(X,Y)=\textrm{Cov}(Y,X)$;
  4. $\textrm{Cov}(aX,Y)=a\textrm{Cov}(X,Y)$;
  5. $\textrm{Cov}(X+c,Y)=\textrm{Cov}(X,Y)$;
  6. $\textrm{Cov}(X+Y,Z)=\textrm{Cov}(X,Z)+\textrm{Cov}(Y,Z)$;
  7. more generally,
\begin{align}%\label{} \nonumber \textrm{Cov}\left(\sum_{i=1}^{m}a_iX_i, \sum_{j=1}^{n}b_jY_j\right)=\sum_{i=1}^{m} \sum_{j=1}^{n} a_ib_j \textrm{Cov}(X_i,Y_j). \end{align}
All of the above results can be proven directly from the definition of covariance. For example, if $X$ and $Y$ are independent, then as we have seen before $E[XY]=EX EY$, so \begin{align}%\label{} \nonumber \textrm{Cov}(X,Y)=E[XY]-EX EY=0. \end{align} Note that the converse is not necessarily true. That is, if $\textrm{Cov}(X,Y)=0$, $X$ and $Y$ may or may not be independent. Let us prove Item 6 in Lemma 5.3, $\textrm{Cov}(X+Y,Z)=\textrm{Cov}(X,Z)+\textrm{Cov}(Y,Z)$. We have \begin{align}%\label{} \nonumber \textrm{Cov}(X+Y,Z)&=E[(X+Y)Z]-E(X+Y)EZ\\ \nonumber &=E[XZ+YZ]-(EX+EY)EZ\\ \nonumber &=EXZ-EXEZ+EYZ-EYEZ\\ \nonumber &=\textrm{Cov}(X,Z)+\textrm{Cov}(Y,Z). \end{align} You can prove the rest of the items in Lemma 5.3 similarly.
Example
Let $X$ and $Y$ be two independent $N(0,1)$ random variables and \begin{align}%\label{} \nonumber &Z=1+X+XY^2, \\ \nonumber &W=1+X. \end{align} Find Cov$(Z,W)$.
  • Solution
    • \begin{align}%\label{} \nonumber \textrm{Cov}(Z,W)&=\textrm{Cov}(1+X+XY^2,1+X) \\ \nonumber &=\textrm{Cov}(X+XY^2,X) \hspace{80pt}(\textrm{by part 5 of Lemma 5.3}) \\ \nonumber &=\textrm{Cov}(X,X)+\textrm{Cov}(XY^2,X) \hspace{44pt}(\textrm{by part 6 of Lemma 5.3}) \\ \nonumber &=\textrm{Var}(X)+E[X^2Y^2]-E[XY^2]EX \hspace{12pt}(\textrm{by part 1 of Lemma 5.3 $\&$ definition of Cov}) \\ \nonumber &=1+E[X^2]E[Y^2]-E[X]^2E[Y^2] \hspace{24pt}(\textrm{since $X$ and $Y$ are independent})\\ \nonumber &=1+1-0=2. \end{align}


Variance of a sum:

One of the applications of covariance is finding the variance of a sum of several random variables. In particular, if $Z=X+Y$, then \begin{align}%\label{} \nonumber \textrm{Var}(Z)&=\textrm{Cov}(Z,Z)\\ \nonumber &=\textrm{Cov}(X+Y,X+Y)\\ \nonumber &=\textrm{Cov}(X,X)+\textrm{Cov}(X,Y)+ \textrm{Cov}(Y,X)+\textrm{Cov}(Y,Y)\\ \nonumber &=\textrm{Var}(X)+\textrm{Var}(Y)+2 \textrm{Cov}(X,Y). \end{align} More generally, for $a,b \in \mathbb{R}$, we conclude:
\begin{align}\label{eq:var-aX+bY} \textrm{Var}(aX+bY)=a^2\textrm{Var}(X)+b^2\textrm{Var}(Y)+2ab \textrm{Cov}(X,Y) \hspace{20pt} (5.21) \end{align}

Correlation Coefficient:

The correlation coefficient, denoted by $\rho_{XY}$ or $\rho(X,Y)$, is obtained by normalizing the covariance. In particular, we define the correlation coefficient of two random variables $X$ and $Y$ as the covariance of the standardized versions of $X$ and $Y$. Define the standardized versions of $X$ and $Y$ as \begin{align}\label{eq:normalize} U=\frac{X-EX}{\sigma_X}, \hspace{10pt} V=\frac{Y-EY}{\sigma_Y} \hspace{20pt} (5.22) \end{align} Then, \begin{align}%\label{} \nonumber \rho_{XY}=\textrm{Cov}(U,V)&=\textrm{Cov}\left(\frac{X-EX}{\sigma_X},\frac{Y-EY}{\sigma_Y}\right)\\ \nonumber &=\textrm{Cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) &(\textrm{by Item 5 of Lemma 5.3})\\ \nonumber &=\frac{\textrm{Cov}(X,Y)}{\sigma_X \sigma_Y}. \end{align}
\begin{align}%\label{} \nonumber \rho_{XY}=\rho(X,Y)=\frac{\textrm{Cov}(X,Y)}{\sqrt{\textrm{Var(X) Var(Y)}}}=\frac{\textrm{Cov}(X,Y)}{\sigma_X \sigma_Y} \end{align}

A nice thing about the correlation coefficient is that it is always between $-1$ and $1$. This is an immediate result of Cauchy-Schwarz inequality that is discussed in Section 6.2.4. One way to prove that $-1 \leq \rho \leq 1$ is to use the following inequality: \begin{align}%\label{} \alpha \beta \leq \frac{\alpha^2+\beta^2}{2}, \textrm{for }\alpha,\beta \in \mathbb{R}. \end{align} This is because $(\alpha-\beta)^2 \geq 0$. The equality holds only if $\alpha=\beta$. From this, we can conclude that for any two random variables $U$ and $V$, \begin{align}%\label{} E[UV] \leq \frac{EU^2+EV^2}{2}, \end{align} with equality only if $U=V$ with probability one. Now, let $ U$ and $V$ be the standardized versions of $X$ and $Y$ as defined in Equation 5.22. Then, by definition $\rho_{XY}=\textrm{Cov}(U,V)=EUV$. But since $EU^2=EV^2=1$, we conclude \begin{align}%\label{} \rho_{XY}=E[UV] & \leq \frac{EU^2+EV^2}{2}=1, \end{align} with equality only if $U=V$. That is, \begin{align}%\label{} \frac{Y-EY}{\sigma_Y}=\frac{X-EX}{\sigma_X}, \end{align} which implies \begin{align}%\label{} Y&=\frac{\sigma_Y}{\sigma_X} X+ \left(EY-\frac{\sigma_Y}{\sigma_X} EX\right)\\ &=aX+b, \hspace{3pt} \textrm{where $a$ and $b$ are constants.} \end{align} Replacing $X$ by $-X$, we conclude that \begin{align}%\label{} \nonumber \rho(-X,Y) \leq 1. \end{align} But $\rho(-X,Y)=-\rho(X,Y)$, thus we conclude $\rho(X,Y) \geq -1$. Thus, we can summarize some properties of the correlation coefficient as follows.
Properties of the correlation coefficient:
  1. $-1 \leq \rho(X,Y) \leq 1$;
  2. if $\rho(X,Y)=1$, then $Y=aX+b$, where $a>0$;
  3. if $\rho(X,Y)=-1$, then $Y=aX+b$, where $a<0$;
  4. $\rho(aX+b,cY+d)=\rho(X,Y)$ for $a,c>0$.

Definition
Consider two random variables $X$ and $Y$:
- If $\rho(X,Y)=0$, we say that $X$ and $Y$ are uncorrelated.
- If $\rho(X,Y)>0$, we say that $X$ and $Y$ are positively correlated.
- If $\rho(X,Y)<0$, we say that $X$ and $Y$ are negatively correlated.


Note that as we discussed previously, two independent random variables are always uncorrelated, but the converse is not necessarily true. That is, if $X$ and $Y$ are uncorrelated, then $X$ and $Y$ may or may not be independent. Also, note that if $X$ and $Y$ are uncorrelated from Equation 5.21, we conclude that $\textrm{Var}(X+Y)=\textrm{Var}(X)+\textrm{Var}(Y)$.
If $X$ and $Y$ are uncorrelated, then \begin{align}%\label{} \nonumber \textrm{Var}(X+Y)=\textrm{Var}(X)+\textrm{Var}(Y). \end{align} More generally, if $X_1,X_2,...,X_n$ are pairwise uncorrelated, i.e., $\rho(X_i,X_j)=0$ when $i \neq j$, then \begin{align}%\label{} \nonumber \textrm{Var}(X_1+X_2+...+X_n)=\textrm{Var}(X_1)+\textrm{Var}(X_2)+...+\textrm{Var}(X_n). \end{align}

Note that if $X$ and $Y$ are independent, then they are uncorrelated, and so $\textrm{Var}(X+Y)=\textrm{Var}(X)+\textrm{Var}(Y)$. This is a fact that we stated previously in Chapter 3, and now we could easily prove using covariance.
Example
Let $X$ and $Y$ be as in Example 5.24 in Section 5.2.3, i.e., suppose that we choose a point $(X,Y)$ uniformly at random in the unit disc \begin{align}%\label{} \nonumber D=\{(x,y)|x^2+y^2 \leq 1\}. \end{align} Are $X$ and $Y$ uncorrelated?
  • Solution
    • We need to check whether $\textrm{Cov}(X,Y)=0$. First note that, in Example 5.24 of Section 5.2.3, we found out that $X$ and $Y$ are not independent and in fact, we found that \begin{align}%\label{} \nonumber X|Y \hspace{5pt} \sim \hspace{5pt} Uniform(-\sqrt{1-Y^2},\sqrt{1-Y^2}). \end{align} Now let's find $\textrm{Cov}(X,Y)=EXY-EXEY$. We have \begin{align}%\label{} \nonumber EX &=E[E[X|Y]] &\big(\textrm{law of iterated expectations (Equation 5.17)}\big) \\ \nonumber &=E[0]=0 &\big(\textrm{since }X|Y \hspace{5pt} \sim \hspace{5pt} Uniform(-\sqrt{1-Y^2},\sqrt{1-Y^2})\big). \end{align} Also, we have \begin{align}%\label{} \nonumber E[XY] &=E[E[XY|Y]] &\big(\textrm{law of iterated expectations (Equation 5.17)}\big)\\ \nonumber &=E[YE[X|Y]] &\big( \textrm{Equation 5.6}\big) \\ \nonumber &=E[Y \cdot 0]=0. \end{align} Thus, \begin{align}%\label{} \nonumber \textrm{Cov}(X,Y)=E[XY]-EXEY=0. \end{align} Thus, $X$ and $Y$ are uncorrelated.




The print version of the book is available on Amazon.

Book Cover


Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI

ractical Uncertaintly Cover