9.1.5 Mean Squared Error (MSE)
For simplicity, let us first consider the case that we would like to estimate $X$ without observing anything. What would be our best estimate of $X$ in that case? Let $a$ be our estimate of $X$. Then, the MSE is given by
\begin{align} h(a)&=E[(Xa)^2]\\ &=EX^22aEX+a^2. \end{align} This is a quadratic function of $a$, and we can find the minimizing value of $a$ by differentiation: \begin{align} h'(a)=2EX+2a. \end{align} Therefore, we conclude the minimizing value of $a$ is \begin{align} a=EX. \end{align} Now, if we have observed $Y=y$, we can repeat the above argument. The only difference is that everything is conditioned on $Y=y$. More specifically, the MSE is given by \begin{align} h(a)&=E[(Xa)^2Y=y]\\ &=E[X^2Y=y]2aE[XY=y]+a^2. \end{align} Again, we obtain a quadratic function of $a$, and by differentiation we obtain the MMSE estimate of $X$ given $Y=y$ as \begin{align} \hat{x}_{M}=E[XY=y]. \end{align}Suppose that we would like to estimate the value of an unobserved random variable $X$, by observing the value of a random variable $Y=y$. In general, our estimate $\hat{x}$ is a function of $y$, so we can write \begin{align} \hat{X}=g(Y). \end{align} Note that, since $Y$ is a random variable, the estimator $\hat{X}=g(Y)$ is also a random variable. The error in our estimate is given by \begin{align} \tilde{X}&=X\hat{X}\\ &=Xg(Y), \end{align} which is also a random variable. We can then define the mean squared error (MSE) of this estimator by \begin{align} E[(X\hat{X})^2]=E[(Xg(Y))^2]. \end{align} From our discussion above we can conclude that the conditional expectation $\hat{X}_M=E[XY]$ has the lowest MSE among all other estimators $g(Y)$.
Let $\hat{X}=g(Y)$ be an estimator of the random variable $X$, given that we have observed the random variable $Y$. The mean squared error (MSE) of this estimator is defined as \begin{align} E[(X\hat{X})^2]=E[(Xg(Y))^2]. \end{align} The MMSE estimator of $X$, \begin{align} \hat{X}_{M}=E[XY], \end{align} has the lowest MSE among all possible estimators.
Properties of the Estimation Error:
Here, we would like to study the MSE of the conditional expectation. First, note that \begin{align} E[\hat{X}_M]&=E[E[XY]]\\ &=E[X] \quad \textrm{(by the law of iterated expectations)}. \end{align} Therefore, $\hat{X}_M=E[XY]$ is an unbiased estimator of $X$. In other words, for $\hat{X}_M=E[XY]$, the estimation error, $\tilde{X}$, is a zeromean random variable \begin{align} E[\tilde{X}]=EXE[\hat{X}_M]=0. \end{align} Before going any further, let us state and prove a useful lemma.Lemma
Define the random variable $W=E[\tilde{X}Y]$. Let $\hat{X}_M=E[XY]$ be the MMSE estimator of $X$ given $Y$, and let $\tilde{X}=X\hat{X}_M$ be the estimation error. Then, we have
 $W=0$.
 For any function $g(Y)$, we have $E[\tilde{X} \cdot g(Y)]=0$.
 We can write \begin{align} W&=E[\tilde{X}Y]\\ &=E[X\hat{X}_MY]\\ &=E[XY]E[\hat{X}_MY]\\ &=\hat{X}_ME[\hat{X}_MY]\\ &=\hat{X}_M\hat{X}_M=0. \end{align} The last line resulted because $\hat{X}_M$ is a function of $Y$, so $E[\hat{X}_MY]=\hat{X}_M$.
 First, note that \begin{align} E[\tilde{X} \cdot g(Y)Y]&=g(Y) E[\tilde{X}Y]\\ &=g(Y) \cdot W=0. \end{align} Next, by the law of iterated expectations, we have \begin{align} E[\tilde{X} \cdot g(Y)]=E\big[E[\tilde{X} \cdot g(Y)Y]\big]=0. \end{align}
We are now ready to state a very interesting property of the estimation error for the MMSE estimator. Namely, we show that the estimation error, $\tilde{X}$, and $\hat{X}_M$ are uncorrelated. To see this, note that \begin{align} \textrm{Cov}(\tilde{X},\hat{X}_M)&=E[\tilde{X}\cdot \hat{X}_M]E[\tilde{X}] E[\hat{X}_M]\\ &=E[\tilde{X} \cdot\hat{X}_M] \quad (\textrm{since $E[\tilde{X}]=0$})\\ &=E[\tilde{X} \cdot g(Y)] \quad (\textrm{since $\hat{X}_M$ is a function of }Y)\\ &=0 \quad (\textrm{by Lemma 9.1}). \end{align} Now, let us look at $\textrm{Var}(X)$. The estimation error is $\tilde{X}=X\hat{X}_M$, so \begin{align} X=\tilde{X}+\hat{X}_M. \end{align} Since $\textrm{Cov}(\tilde{X},\hat{X}_M)=0$, we conclude \begin{align}\label{eq:varMSE} \textrm{Var}(X)=\textrm{Var}(\hat{X}_M)+\textrm{Var}(\tilde{X}). \hspace{30pt} (9.3) \end{align} The above formula can be interpreted as follows. Part of the variance of $X$ is explained by the variance in $\hat{X}_M$. The remaining part is the variance in estimation error. In other words, if $\hat{X}_M$ captures most of the variation in $X$, then the error will be small. Note also that we can rewrite Equation 9.3 as \begin{align} E[X^2]E[X]^2=E[\hat{X}^2_M]E[\hat{X}_M]^2+E[\tilde{X}^2]E[\tilde{X}]^2. \end{align} Note that \begin{align} E[\hat{X}_M]=E[X], \quad E[\tilde{X}]=0. \end{align} We conclude \begin{align} E[X^2]=E[\hat{X}^2_M]+E[\tilde{X}^2]. \end{align}
  The MMSE estimator, $\hat{X}_{M}=E[XY]$, is an unbiased estimator of $X$, i.e., \begin{align} E[\hat{X}_{M}]=EX, \quad E[\tilde{X}]=0. \end{align}
  The estimation error, $\tilde{X}$, and $\hat{X}_{M}$ are uncorrelated \begin{align} \textrm{Cov}(\tilde{X},\hat{X}_M)=0. \end{align}
  We have \begin{align} \textrm{Var}(X)&=\textrm{Var}(\hat{X}_M)+\textrm{Var}(\tilde{X}),\\ E[X^2]&=E[\hat{X}^2_M]+E[\tilde{X}^2]. \end{align}
Example
Let $X \sim N(0, 1)$ and \begin{align} Y=X+W, \end{align} where $W \sim N(0, 1)$ is independent of $X$.
 Find the MMSE estimator of $X$ given $Y$, ($\hat{X}_M$).
 Find the MSE of this estimator, using $MSE=E[(X\hat{X_M})^2]$.
 Check that $E[X^2]=E[\hat{X}^2_M]+E[\tilde{X}^2]$.
 Solution

Since $X$ and $W$ are independent and normal, $Y$ is also normal. Moreover, $X$ and $Y$ are also jointly normal, since for all $a,b \in \mathbb{R}$, we have
\begin{align}
aX+bY=(a+b)X+bW,
\end{align}
which is also a normal random variable. Note also,
\begin{align}
\textrm{Cov}(X,Y)&=\textrm{Cov}(X,X+W)\\
&=\textrm{Cov}(X,X)+\textrm{Cov}(X,W)\\
&=\textrm{Var}(X)=1.
\end{align}
Therefore,
\begin{align}
\rho(X,Y)&=\frac{\textrm{Cov}(X,Y)}{\sigma_X \sigma_Y}\\
&=\frac{1}{1 \cdot \sqrt{2}}=\frac{1}{\sqrt{2}}.
\end{align}
 The MMSE estimator of $X$ given $Y$ is \begin{align} \hat{X}_M&=E[XY]\\ &=\mu_X+ \rho \sigma_X \frac{Y\mu_Y}{\sigma_Y}\\ &=\frac{Y}{2}. \end{align}
 The MSE of this estimator is given by \begin{align} E[(X\hat{X_M})^2]&=E\left[\left(X\frac{Y}{2}\right)^2\right]\\ &=E\left[X^2XY+\frac{Y^2}{4}\right]\\ &=EX^2E[X(X+W)]+\frac{EY^2}{4}\\ &=EX^2EX^2EXEW+\frac{EY^2}{4}\\ &=\frac{\textrm{Var}(Y)+(EY)^2}{4}\\ &=\frac{2+0}{4}=\frac{1}{2}. \end{align}
 Note that $E[X^2]=1$. Also, \begin{align} E[\hat{X}^2_M]=\frac{EY^2}{4}=\frac{1}{2}. \end{align} In the above, we also found $MSE=E[\tilde{X}^2]=\frac{1}{2}$. Therefore, we have \begin{align} E[X^2]=E[\hat{X}^2_M]+E[\tilde{X}^2]. \end{align}

Since $X$ and $W$ are independent and normal, $Y$ is also normal. Moreover, $X$ and $Y$ are also jointly normal, since for all $a,b \in \mathbb{R}$, we have
\begin{align}
aX+bY=(a+b)X+bW,
\end{align}
which is also a normal random variable. Note also,
\begin{align}
\textrm{Cov}(X,Y)&=\textrm{Cov}(X,X+W)\\
&=\textrm{Cov}(X,X)+\textrm{Cov}(X,W)\\
&=\textrm{Var}(X)=1.
\end{align}
Therefore,
\begin{align}
\rho(X,Y)&=\frac{\textrm{Cov}(X,Y)}{\sigma_X \sigma_Y}\\
&=\frac{1}{1 \cdot \sqrt{2}}=\frac{1}{\sqrt{2}}.
\end{align}