5.1.5 Conditional Expectation (Revisited) and Conditional Variance
Conditional Expectation as a Function of a Random Variable:
Remember that the conditional expectation of $X$ given that $Y=y$ is given by \begin{align}%\label{} \nonumber E[XY=y]=\sum_{x_i \in R_{X}} x_i P_{XY}(x_iy). \end{align} Note that $E[XY=y]$ depends on the value of $y$. In other words, by changing $y$, $E[XY=y]$ can also change. Thus, we can say $E[XY=y]$ is a function of $y$, so let's write \begin{align}%\label{} \nonumber g(y)=E[XY=y]. \end{align} Thus, we can think of $g(y)=E[XY=y]$ as a function of the value of random variable $Y$. We then write \begin{align}%\label{} \nonumber g(Y)=E[XY]. \end{align} We use this notation to indicate that $E[XY]$ is a random variable whose value equals $g(y)=E[XY=y]$ when $Y=y$. Thus, if $Y$ is a random variable with range $R_Y=$$\{y _1, y_2, \cdots \}$, then $E[XY]$ is also a random variable with \begin{equation} \nonumber E[XY] = \left\{ \begin{array}{l l} E[XY=y_1] & \quad \textrm{with probability } P(Y=y_1) \\ E[XY=y_2] & \quad \textrm{with probability } P(Y=y_2) \\ &. \hspace{40pt} . \\ &. \hspace{40pt} . \\ &. \hspace{40pt} . \end{array} \right. \end{equation} Let's look at an example.Example Let $X=aY+b$. Then $E[XY=y]=E[aY+bY=y]=ay+b$. Here, we have $g(y)=ay+b$, and therefore, \begin{align}%\label{} \nonumber E[XY]=aY+b, \end{align} which is a function of the random variable $Y$.
Since $E[XY]$ is a random variable, we can find its PMF, CDF, variance, etc. Let's look at an example to better understand $E[XY]$.
Example Consider two random variables $X$ and $Y$ with joint PMF given in Table 5.2. Let $Z=E[XY]$.
 Find the Marginal PMFs of $X$ and $Y$.
 Find the conditional PMF of $X$ given $Y=0$ and $Y=1$, i.e., find $P_{XY}(x0)$ and $P_{XY}(x1)$.
 Find the $PMF$ of $Z$.
 Find $EZ$, and check that $EZ=EX$.
 Find Var$(Z)$.
Table 5.2: Joint PMF of X and Y in example 5.11
$Y = 0$  $Y = 1$  
$X = 0$  $\frac{1}{5}$  $\frac{2}{5}$ 
$X = 1$  $\frac{2}{5}$  $0$ 
 Solution

 Using the table we find out \begin{align}%\label{} \nonumber &P_X(0)=\frac{1}{5}+\frac{2}{5}=\frac{3}{5}, \\ \nonumber &P_X(1)=\frac{2}{5}+0=\frac{2}{5}, \\ \nonumber &P_Y(0)=\frac{1}{5}+\frac{2}{5}=\frac{3}{5}, \\ \nonumber &P_Y(1)=\frac{2}{5}+0=\frac{2}{5}. \end{align} Thus, the marginal distributions of $X$ and $Y$ are both $Bernoulli(\frac{2}{5})$. However, note that $X$ and $Y$ are not independent.
 We have \begin{align}%\label{} \nonumber &P_{XY}(00)=\frac{P_{XY}(0,0)}{P_{Y}(0)}\\ \nonumber &= \frac{\frac{1}{5}}{\frac{3}{5}}=\frac{1}{3}. \end{align} Thus, \begin{align}%\label{} \nonumber &P_{XY}(10)=1\frac{1}{3}=\frac{2}{3}. \end{align} We conclude \begin{align}%\label{} \nonumber XY=0 \hspace{5pt} \sim \hspace{5pt} Bernoulli \left(\frac{2}{3}\right). \end{align} Similarly, we find \begin{align}%\label{} \nonumber &P_{XY}(01)=1,\\ \nonumber &P_{XY}(11)=0. \end{align} Thus, given $Y=1$, we have always $X=0$.
 We note that the random variable $Y$ can take two values: $0$ and $1$. Thus, the random variable $Z=E[XY]$ can take two values as it is a function of $Y$. Specifically, \begin{equation} \nonumber Z = E[XY]= \left\{ \begin{array}{l l} E[XY=0] & \quad \textrm{if } Y=0 \\ & \quad \\ E[XY=1] & \quad \textrm{if } Y=1 \end{array} \right. \end{equation} Now, using the previous part, we have \begin{align}%\label{} \nonumber E[XY=0]=\frac{2}{3}, \hspace{15pt} E[XY=1]=0, \end{align} and since $P(y=0)=\frac{3}{5}$, and $P(y=1)=\frac{2}{5}$, we conclude that \begin{equation} \nonumber Z = E[XY]= \left\{ \begin{array}{l l} \frac{2}{3} & \quad \textrm{with probability } \frac{3}{5} \\ & \quad \\ 0 & \quad \textrm{with probability } \frac{2}{5} \end{array} \right. \end{equation} So we can write \begin{equation} \nonumber P_Z(z) = \left\{ \begin{array}{l l} \frac{3}{5} & \quad \textrm{if } z=\frac{2}{3} \\ & \quad \\ \frac{2}{5} & \quad \textrm{if } z=0\\ & \quad \\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation}
 Now that we have found the PMF of $Z$, we can find its mean and variance. Specifically, \begin{align}%\label{} \nonumber E[Z]=\frac{2}{3} \cdot \frac{3}{5}+ 0 \cdot \frac{2}{5} =\frac{2}{5}. \end{align} We also note that $EX=\frac{2}{5}$. Thus, here we have \begin{align}%\label{} \nonumber E[X]=E[Z]=E[E[XY]]. \end{align} In fact, as we will prove shortly, the above equality always holds. It is called the law of iterated expectations.
 To find Var$(Z)$, we write \begin{align}%\label{} \nonumber \textrm{Var}(Z)&=E[Z^2](EZ)^2\\ \nonumber &=E[Z^2]\frac{4}{25}, \end{align} where \begin{align}%\label{} \nonumber E[Z^2]=\frac{4}{9} \cdot \frac{3}{5}+0 \cdot \frac{2}{5}=\frac{4}{15}. \end{align} Thus, \begin{align}%\label{} \nonumber \textrm{Var}(Z)&=\frac{4}{15}\frac{4}{25}\\ \nonumber &=\frac{8}{75}. \end{align}

Example
Let $X$ and $Y$ be two random variables and $g$ and $h$ be two functions. Show that \begin{align}%\label{} \nonumber E[g(X)h(Y)X]=g(X)E[h(Y)X]. \end{align}
 Solution
 Note that $E[g(X)h(Y)X]$ is a random variable that is a function of $X$. In particular, if $X=x$, then $E[g(X)h(Y)X]=E[g(X)h(Y)X=x]$. Now, we can write \begin{align}%\label{} \nonumber E[g(X)h(Y)X=x]&=E[g(x)h(Y)X=x]\\ \nonumber &=g(x)E[h(Y)X=x] \hspace{30pt} \textrm{(since $g(x)$ is a constant)}. \end{align} Thinking of this as a function of the random variable $X$, it can be rewritten as $E[g(X)h(Y)X]=g(X)E[h(Y)X]$. This rule is sometimes called "taking out what is known." The idea is that, given $X$, $g(X)$ is a known quantity, so it can be taken out of the conditional expectation.
\begin{align}\label{eq:EGHX} E[g(X)h(Y)X]=g(X)E[h(Y)X] \hspace{30pt} (5.6) \end{align}
Iterated Expectations:
Let us look again at the law of total probability for expectation. Assuming $g(Y)=E[XY]$, we have \begin{align} \nonumber E[X]&=\sum_{y_j \in R_Y} E[XY=y_j]P_Y(y_j)\\ \nonumber &=\sum_{y_j \in R_Y} g(y_j)P_Y(y_j)\\ \nonumber &=E[g(Y)] \hspace{30pt} \textrm{by LOTUS (Equation 5.2)}\\ \nonumber &=E[E[XY]]. \end{align} Thus, we conclude \begin{align} \label{eq:iteratedE} E[X]=E[E[XY]]. \hspace{40pt} (5.7) \end{align} This equation might look a little confusing at first, but it is just another way of writing the law of total expectation (Equation 5.4). To better understand it, let's solve Example 5.7 using this terminology. In that example, we want to find $EX$. We can write \begin{align} \nonumber E[X]&=E[E[XN]] \\ \nonumber &=E[Np] \hspace{30pt} (\textrm{since }XN \sim Binomial(N,p)) \\ \nonumber &=pE[N]=p\lambda. \end{align} Equation 5.7 is called the law of iterated expectations. Since it is basically the same as Equation 5.4, it is also called the law of total expectation [3].Expectation for Independent Random Variables:
Note that if two random variables $X$ and $Y$ are independent, then the conditional PMF of $X$ given $Y$ will be the same as the marginal PMF of $X$, i.e., for any $x \in R_X$, we have \begin{align}%\label{} \nonumber P_{XY}(xy)=P_X(x). \end{align} Thus, for independent random variables, we have \begin{align}%\label{} \nonumber E[XY=y]&= \sum_{x \in R_{X}} x P_{XY}(xy)\\ &= \sum_{x \in R_{X}} x P_{X}(x)\\ &=E[X]. \end{align} Again, thinking of this as a random variable depending on $Y$, we obtain \begin{align}%\label{} \nonumber E[XY]=E[X], \textrm{ when $X$ and $Y$ are independent.} \end{align} More generally, if $X$ and $Y$ are independent then any function of $X$, say $g(X)$, and $Y$ are independent, thus \begin{align}%\label{} \nonumber E[g(X)Y]=E[g(X)]. \end{align} Remember that for independent random variables, $P_{XY}(x,y)=P_X(x)P_Y(y)$. From this, we can show that $E[XY]=EX EY$.Lemma
If $X$ and $Y$ are independent, then $E[XY]=EX EY$. Using LOTUS, we have \begin{align}%\label{} \nonumber E[XY] &=\sum_{x \in R_x} \sum_{y \in R_y} xy P_{XY}(x,y)\\ \nonumber &=\sum_{x \in R_x} \sum_{y \in R_y} xy P_X(x)P_Y(y)\\ \nonumber &=\bigg(\sum_{x \in R_x} x P_X(x) \bigg) \bigg(\sum_{y \in R_y} yP_Y(y)\bigg)\\ \nonumber &=EX EY. \end{align} Note that the converse is not true. That is, if the only thing that we know about $X$ and $Y$ is that $E[XY]=EX EY$, then $X$ and $Y$ may or may not be independent. Using essentially the same proof as above, we can show if $X$ and $Y$ are independent, then $E[g(X)h(Y)]=E[g(X)]E[h(Y)]$ for any functions $g:\mathbb{R} \mapsto \mathbb{R}$ and $h:\mathbb{R} \mapsto \mathbb{R}$.
 $E[XY]=EX$;
 $E[g(X)Y]=E[g(X)]$;
 $E[XY]=EX EY$;
 $E[g(X)h(Y)]=E[g(X)] E[h(Y)]$.
Conditional Variance:
Similar to the conditional expectation, we can define the conditional variance of $X$, Var$(XY=y)$, which is the variance of $X$ in the conditional space where we know $Y=y$. If we let $\mu_{XY}(y)=E[XY=y]$, then \begin{align}%\label{} \nonumber \textrm{Var}(XY=y) &=E\big[(X\mu_{XY}(y))^2Y=y \big] \\ \nonumber &=\sum_{x_i \in R_X} \big(x_i\mu_{XY}(y)\big)^2 P_{XY}(x_i)\\ \nonumber &=E\big[X^2Y=y\big]\mu_{XY}(y)^2. \end{align} Note that Var$(XY=y)$ is a function of $y$. Similar to our discussion on $E[XY=y]$ and $E[XY]$, we define Var$(XY)$ as a function of the random variable $Y$. That is, Var$(XY)$ is a random variable whose value equals Var$(XY=y)$ whenever $Y=y$. Let us look at an example.Example
Let $X$, $Y$, and $Z=E[XY]$ be as in Example 5.11. Let also $V=$Var$(XY)$.
 Find the PMF of $V$.
 Find $EV$.
 Check that Var$(X)=E(V)+$Var$(Z)$.
 Solution

In Example 5.11, we found out that $X,Y \sim Bernoulli(\frac{2}{5})$. We also obtained
\begin{align}%\label{}
\nonumber &XY=0 \hspace{5pt} \sim \hspace{5pt} Bernoulli \left(\frac{2}{3}\right), \\
\nonumber &P(X=0Y=1)=1,\\
\nonumber &\textrm{Var}(Z)=\frac{8}{75}.
\end{align}
 To find the PMF of $V$, we note that $V$ is a function of $Y$. Specifically, \begin{equation} \nonumber V = \textrm{Var}(XY)= \left\{ \begin{array}{l l} \textrm{Var}(XY=0) & \quad \textrm{if } Y=0 \\ & \quad \\ \textrm{Var}(XY=1)& \quad \textrm{if } Y=1 \end{array} \right. \end{equation} Therefore, \begin{equation} \nonumber V = \textrm{Var}(XY)= \left\{ \begin{array}{l l} \textrm{Var}(XY=0) & \quad \textrm{with probability } \frac{3}{5} \\ & \quad \\ \textrm{Var}(XY=1)& \quad \textrm{with probability } \frac{2}{5} \end{array} \right. \end{equation} Now, since $XY=0 \hspace{5pt} \sim \hspace{5pt} Bernoulli \left(\frac{2}{3}\right)$, we have \begin{align}%\label{} \nonumber \textrm{Var}(XY=0)=\frac{2}{3} \cdot \frac{1}{3}=\frac{2}{9}, \end{align} and since given $Y=1$, $X=0$, we have \begin{align}%\label{} \nonumber \textrm{Var}(XY=1)=0. \end{align} Thus, \begin{equation} \nonumber V = \textrm{Var}(XY)= \left\{ \begin{array}{l l} \frac{2}{9} & \quad \textrm{with probability } \frac{3}{5} \\ & \quad \\ 0 & \quad \textrm{with probability } \frac{2}{5} \end{array} \right. \end{equation} So we can write \begin{equation} \nonumber P_V(v) = \left\{ \begin{array}{l l} \frac{3}{5} & \quad \textrm{if } v=\frac{2}{9} \\ & \quad \\ \frac{2}{5} & \quad \textrm{if } v=0\\ & \quad \\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation}
 To find $EV$, we write \begin{align}%\label{} \nonumber EV=\frac{2}{9} \cdot \frac{3}{5}+0 \cdot \frac{2}{5}=\frac{2}{15}. \end{align}
 To check that Var$(X)=E(V)+$Var$(Z)$, we just note that \begin{align}%\label{} \nonumber &\textrm{Var}(X)=\frac{2}{5} \cdot \frac{3}{5}=\frac{6}{25},\\ \nonumber &EV=\frac{2}{15},\\ \nonumber &\textrm{Var}(Z)=\frac{8}{75}. \end{align}
In the above example, we checked that Var$(X)=E(V)+$Var$(Z)$, which says \begin{align}%\label{} \nonumber \textrm{Var}(X)=E(\textrm{Var}(XY))+\textrm{Var}(E[XY]). \end{align} It turns out this is true in general and it is called the law of total variance, or variance decomposition formula [3]. Let us first prove the law of total variance, and then we explain it intuitively. Note that if $V=$Var$(XY)$, and $Z=E[XY]$, then \begin{align}%\label{} \nonumber V&=E[X^2Y](E[XY])^2\\ \nonumber &=E[X^2Y]Z^2. \end{align} Thus, \begin{align}\label{eq:1of2} \nonumber EV&=E[E[X^2Y]]E[Z^2]\\ &=E[X^2]E[Z^2] &\big(\textrm{law of iterated expectations(Equation 5.7)}\big) \hspace{20pt} (5.8) \end{align} Next, we have \begin{align}\label{eq:2of2} \nonumber \textrm{Var}(Z)&=E[Z^2](EZ)^2\\ &=E[Z^2](EX)^2 &(\textrm{law of iterated expectations}) \hspace{20pt} (5.9) \end{align} Combining Equations 5.8 and 5.9, we obtain the law of total variance.
Law of Total Variance:
\begin{align}\label{eq:LOTV} \textrm{Var}(X)=E[\textrm{Var}(XY)]+\textrm{Var}(E[XY]) \hspace{30pt} (5.10) \end{align}There are several ways that we can look at the law of total variance to get some intuition. Let us first note that all the terms in Equation 5.10 are positive (since variance is always positive). Thus, we conclude \begin{align}\label{eq:condReducesVariance} \textrm{Var}(X) \geq E(\textrm{Var}(XY)) \hspace{30pt} (5.11) \end{align}
This states that when we condition on $Y$, the variance of $X$ reduces on average. To describe this intuitively, we can say that variance of a random variable is a measure of our uncertainty about that random variable. For example, if Var$(X)=0$, we do not have any uncertainty about $X$. Now, the above inequality simply states that if we obtain some extra information, i.e., we know the value of $Y$, our uncertainty about the value of the random variable $X$ reduces on average. So, the above inequality makes sense. Now, how do we explain the whole law of total variance?
To describe the law of total variance intuitively, it is often useful to look at a population divided into several groups. In particular, suppose that we have this random experiment: We pick a person in the world at random and look at his/her height. Let's call the resulting value $X$. Define another random variable $Y$ whose value depends on the country of the chosen person, where $Y=1,2,3,...,n$, and $n$ is the number of countries in the world. Then, let's look at the two terms in the law of total variance.
\begin{align} \nonumber \textrm{Var}(X)=E(\textrm{Var}(XY))+\textrm{Var}(E[XY]). \end{align} Note that $\textrm{Var}(XY=i)$ is the variance of $X$ in country $i$. Thus, $E(\textrm{Var}(XY))$ is the average of variances in each country. On the other hand, $E[XY=i]$ is the average height in country $i$. Thus, $\textrm{Var}(E[XY])$ is the variance between countries. So, we can interpret the law of total variance in the following way. Variance of $X$ can be decomposed into two parts: the first is the average of variances in each individual country, while the second is the variance between height averages in each country.Example
Let $N$ be the number of customers that visit a certain store in a given day. Suppose that we know $E[N]$ and Var$(N)$. Let $X_i$ be the amount that the $i$th customer spends on average. We assume $X_i$'s are independent of each other and also independent of $N$. We further assume they have the same mean and variance \begin{align}%\label{} \nonumber &EX_i=EX, \\ \nonumber &\textrm{Var}(X_i)=\textrm{Var}(X). \end{align} Let $Y$ be the store's total sales, i.e., \begin{align}%\label{} \nonumber Y=\sum_{i=1}^{N}X_i. \end{align} Find $EY$ and Var$(Y)$.
 Solution
 To find $EY$, we cannot directly use the linearity of expectation because $N$ is random. But, conditioned on $N=n$, we can use linearity and find $E[YN=n]$; so, we use the law of iterated expectations: \begin{align}%\label{} \nonumber EY&=E[E[YN]] &(\textrm{law of iterated expectations})\\ \nonumber &=E\left[E\bigg[\sum_{i=1}^{N}X_iN\bigg]\right]\\ \nonumber &=E\left[\sum_{i=1}^{N}E[X_iN] \right] & (\textrm{linearity of expectation})\\ \nonumber &=E\left[\sum_{i=1}^{N}E[X_i] \right] & (\textrm{$X_i$'s and } N \textrm{ are indpendent})\\ \nonumber &=E[NE[X]] & (\textrm{since $EX_i=EX$s}) \\ \nonumber &=E[X]E[N] & (\textrm{since $EX$ is not random}). \end{align} To find Var$(Y)$, we use the law of total variance: \begin{align}\label{al1} \nonumber \textrm{Var}(Y)&=E(\textrm{Var}(YN))+\textrm{Var}(E[YN])\\ \nonumber &=E(\textrm{Var}(YN))+\textrm{Var}(NEX) &(\textrm{as above})\\ &=E(\textrm{Var}(YN))+(EX)^2\textrm{Var}(N) \hspace{30pt} (5.12) \end{align} To find $E(\textrm{Var}(YN))$, note that, given $N=n$, $Y$ is a sum of $n$ independent random variables. As we discussed before, for $n$ independent random variables, the variance of the sum is equal to sum of the variances. This fact is officially proved in Section 5.3 and also in Chapter 6, but we have occasionally used it as it simplifies the analysis. Thus, we can write \begin{align} \nonumber \textrm{Var}(YN)&=\sum_{i=1}^{N} \textrm{Var}(X_iN)\\ \nonumber &=\sum_{i=1}^{N} \textrm{Var}(X_i) &(\textrm{since }X_i\textrm{'s are independent of }N)\\ \nonumber &=N Var(X). \end{align} Thus, we have \begin{align}\label{al2} E(\textrm{Var}(YN))=EN Var(X) \hspace{30pt} (5.13) \end{align} Combining Equations 5.12 and 5.13, we obtain \begin{align} \nonumber \textrm{Var}(Y)= EN Var(X)+(EX)^2 Var(N). \end{align}