Linear MMSE Estimation of Random Variables

9.1.6 Linear MMSE Estimation of Random Variables

Suppose that we would like to estimate the value of an unobserved random variable $X$, given that we have observed $Y=y$. In general, our estimate $\hat{x}$ is a function of $y$ \begin{align} \hat{x}=g(y). \end{align} For example, the MMSE estimate of $X$ given $Y=y$ is \begin{align} g(y)=E[X|Y=y]. \end{align} We might face some difficulties if we want to use the MMSE in practice. First, the function $g(y)=E[X|Y=y]$ might have a complicated form. Specifically, if $X$ and $Y$ are random vectors, computing $E[X|Y=y]$ might not be easy. Moreover, to find $E[X|Y=y]$ we need to know $f_{X|Y}(y)$, which might not be easy to find in some problems. To address these issues, we might want to use a simpler function $g(y)$ to estimate $X$. In particular, we might want $g(y)$ to be a linear function of $y$.

Suppose that we would like to have an estimator for $X$ of the form

\begin{align} \hat{X}_L=g(Y)=aY+b, \end{align} where $a$ and $b$ are some real numbers to be determined. More specifically, our goal is to choose $a$ and $b$ such that the MSE of the above estimator \begin{align} MSE=E[(X-\hat{X}_L)^2] \end{align} is minimized. We call the resulting estimator the linear MMSE estimator. The following theorem gives us the optimal values for $a$ and $b$.

Theorem
Let $X$ and $Y$ be two random variables with finite means and variances. Also, let $\rho$ be the correlation coefficient of $X$ and $Y$. Consider the function \begin{align} h(a,b)=E[(X-aY-b)^2]. \end{align} Then,

The function $h(a,b)$ is minimized if \begin{align} a=a^*=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)}, \quad b=b^*=EX-aEY. \end{align}
We have $h(a^*,b^*)=(1-\rho^2)\textrm{Var}(X)$.
$E[(X-a^*Y-b^*)Y]=0$ (orthogonality principle).

Proof: We have \begin{align} h(a,b)&=E[(X-aY-b)^2]\\ &=E[X^2+a^2Y^2+b^2-2aXY-2bX+2abY]\\ &=EX^2+a^2EY^2+b^2-2aEXY-2bEX+2abEY. \end{align} Thus, $h(a,b)$ is a quadratic function of $a$ and $b$. We take the derivatives with respect to $a$ and $b$ and set them to zero, so we obtain \begin{align} EY^2 \cdot a +EY \cdot b= EXY \hspace{30pt} (9.4) \end{align} \begin{align} EY \cdot a+ b=EX \hspace{30pt} (9.5) \end{align} Solving for $a$ and $b$, we obtain \begin{align} a^*=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)}, \quad b^*=EX-aEY. \end{align} It can be verified that the above values do in fact minimize $h(a,b)$. Note that Equation 9.5 implies that $E[X-a^*Y-b^*]=0$. Therefore, \begin{align} h(a^*,b^*)&=E[(X-a^*Y-b^*)^2]\\ &=\textrm{Var}(X-a^*Y-b^*)\\ &=\textrm{Var}(X-a^*Y)\\ &=\textrm{Var}(X)+a^{*2}\textrm{Var}(Y)-2 a^* \textrm{Cov}(X,Y)\\ &=\textrm{Var}(X)+\frac{\textrm{Cov}(X,Y)^2}{\textrm{Var}(Y)^2}\textrm{Var}(Y)-2 \frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} \textrm{Cov}(X,Y)\\ &=\textrm{Var}(X)-\frac{\textrm{Cov}(X,Y)^2}{\textrm{Var}(Y)}\\ &=(1-\rho^2)\textrm{Var}(X). \end{align} Finally, note that \begin{align} E[(X-a^*Y-b^*)Y]&=EXY-a^*EY^2-b^*EY\\ &=0 \quad (\textrm{by Equation 9.4}). \end{align}

Note that $\tilde{X}=X-a^*Y-b^*$ is the error in the linear MMSE estimation of $X$ given $Y$. From the above theorem, we conclude that \begin{align} &E[\tilde{X}]=0,\\ &E[\tilde{X} Y]=0. \end{align} In sum, we can write the linear MMSE estimator of $X$ given $Y$ as \begin{align} \hat{X}_L=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX. \end{align} If $\rho=\rho(X,Y)$ is the correlation coefficient of $X$ and $Y$, then $\textrm{Cov}(X,Y)=\rho \sigma_X \sigma_Y$, so the above formula can be written as \begin{align} \hat{X}_L=\frac{\rho \sigma_X}{\sigma_Y} (Y-EY)+ EX. \end{align}

Linear MMSE Estimator
The linear MMSE estimator of the random variable $X$, given that we have observed $Y$, is given by \begin{align} \hat{X}_L&=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX\\ &=\frac{\rho \sigma_X}{\sigma_Y} (Y-EY)+ EX. \end{align} The estimation error, defined as $\tilde{X}=X-\hat{X}_L$, satisfies the orthogonality principle: \begin{align} &E[\tilde{X}]=0,\\ &\textrm{Cov}(\tilde{X},Y)=E[\tilde{X} Y]=0. \end{align} The MSE of the linear MMSE is given by \begin{align} E\big[(X-X_L)^2\big]=E[\tilde{X}^2]=(1-\rho^2)\textrm{Var}(X). \end{align}

Note that to compute the linear MMSE estimates, we only need to know expected values, variances, and the covariance. Let us look at an example.

Example
Suppose $X \sim Uniform(1,2)$, and given $X=x$, $Y$ is exponential with parameter $\lambda=\frac{1}{x}$.

Find the linear MMSE estimate of $X$ given $Y$.
Find the MSE of this estimator.
Check that $E[\tilde{X} Y]=0$.

Solution
- We have \begin{align} \hat{X}_L=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX. \end{align} Therefore, we need to find $EX$, $EY$, $\textrm{Var}(Y)$, and $\textrm{Cov}(X,Y)$. First, note that we have $EX=\frac{3}{2}$, and \begin{align}%\label{} \nonumber EY &=E[E[Y|X]] &\big(\textrm{law of iterated expectations} \big)\\ \nonumber &=E\left[X\right] &\big(\textrm{since }Y|X \sim Exponential(\frac{1}{X})\big)\\ \nonumber &=\frac{3}{2}. \end{align} \begin{align}%\label{} \nonumber EY^2 &=E[E[Y^2|X]] & (\textrm{law of iterated expectations})\\ \nonumber &=E\left[2X^2\right] &\big(\textrm{since }Y|X \sim Exponential(\frac{1}{X})\big)\\ \nonumber &=\int_{1}^{2} 2x^2 dx\\ \nonumber &=\frac{14}{3}. \end{align} Therefore, \begin{align}%\label{} \textrm{Var}(Y)&=EY^2-(EY)^2\\ &=\frac{14}{3}-\frac{9}{4}\\ &=\frac{29}{12}. \end{align} We also have \begin{align}%\label{} \nonumber EXY &=E[E[XY|X]] &\big(\textrm{law of iterated expectations}\big)\\ \nonumber EXY &=E[XE[Y|X]] &\big(\textrm{given $X$, $X$ is a constant}\big)\\ \nonumber &=E\left[X \cdot X\right] &\big(\textrm{since }Y|X \sim Exponential(\frac{1}{X})\big)\\ \nonumber &=\int_{1}^{2} x^2 dx\\ &=\frac{7}{3}. \end{align} Thus, \begin{align}%\label{} \textrm{Cov}(X,Y)&=E[XY]-(EX)(EY)\\ &=\frac{7}{3}-\frac{3}{2} \cdot \frac{3}{2}\\ &=\frac{1}{12}. \end{align}
  1. The linear MMSE estimate of $X$ given $Y$ is \begin{align} \hat{X}_L&=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX\\ &=\frac{1}{29}\left(Y-\frac{3}{2}\right)+\frac{3}{2}\\ &=\frac{Y}{29}+\frac{42}{29}. \end{align}
  2. The MSE of $\hat{X}_L$ is \begin{align} MSE=(1-\rho^2)\textrm{Var}(X). \end{align} Since $X \sim Uniform(1,2)$, $\textrm{Var}(X)=\frac{1}{12}$. Also, \begin{align} \rho^2&=\frac{\textrm{Cov}^2(X,Y)}{\textrm{Var}(X) \textrm{Var}(Y)}\\ &=\frac{1}{29}. \end{align} Thus, \begin{align} MSE=\left(1-\frac{1}{29}\right)\frac{1}{12}=\frac{7}{87}. \end{align}
  3. We have \begin{align} \tilde{X}&=X-\hat{X}_L\\ &=X-\frac{Y}{29}-\frac{42}{29}. \end{align} Therefore, \begin{align} E[\tilde{X}Y]&=E\left[\left(X-\frac{Y}{29}-\frac{42}{29}\right)Y\right]\\ &=E[XY]-\frac{EY^2}{29}-\frac{42}{29}EY\\ &=\frac{7}{3}-\frac{14}{3 \cdot 29}-\frac{42}{29}\cdot \frac{3}{2}\\ &=0. \end{align}

← previous

The print version of the book is available on Amazon.

Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI