9.1.7 Estimation for Random Vectors

The examples that we have seen so far involved only two random variables $X$ and $Y$. In practice, we often need to estimate several random variables and we might observe several random variables. In other words, we might want to estimate the value of an unobserved random vector $\textbf{X}$: \begin{equation} \nonumber \textbf{X} = \begin{bmatrix} X_1 \\%[5pt] X_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] X_m \end{bmatrix}, \end{equation} given that we have observed the random vector $\textbf{Y}$, \begin{equation} \nonumber \textbf{Y} = \begin{bmatrix} Y_1 \\%[5pt] Y_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] Y_n \end{bmatrix}. \end{equation} Almost everything that we have discussed can be extended to the case of random vectors. For example, to find the MMSE estimate of $\textbf{X}$ given $\textbf{Y}=\textbf{y}$, we can write \begin{equation} \hat{\mathbf{X}}_M=E[\mathbf{X}|\mathbf{Y}]=\begin{bmatrix} E[X_1|Y_1,Y_2,\cdots,Y_n] \\%[5pt] E[X_2|Y_1,Y_2,\cdots,Y_n] \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] E[X_m|Y_1,Y_2,\cdots,Y_n] \end{bmatrix}. \end{equation} However, the above conditional expectations might be too complicated computationally. Therefore, for random vectors, it is very common to consider simpler estimators such as the linear MMSE. Let's now discuss linear MMSE for random vectors.

Linear MMSE for Random Vectors:

Suppose that we would like to have an estimator for the random vector $\mathbf{X}$ in the form of \begin{align} \hat{\mathbf{X}}_L=\mathbf{A} \mathbf{Y}+ \mathbf{b}, \end{align} where $\mathbf{A}$ and $\mathbf{b}$ are fixed matrices to be determined. Remember that for two random variables $X$ and $Y$, the linear MMSE estimator of $X$ given $Y$ is \begin{align} \hat{X}_L&=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX\\ &=\frac{\textrm{Cov}(X,Y)}{\textrm{Cov}(Y,Y)} (Y-EY)+ EX. \end{align} We can extend this result to the case of random vectors. More specifically, we can show that the linear MMSE estimator of the random vector $\mathbf{X}$ given the random vector $\mathbf{Y}$ is given by \begin{align} \hat{\mathbf{X}}_L=\mathbf{\textbf{C}_\textbf{XY}} \mathbf{\textbf{C}_\textbf{Y}}^{-1} (\mathbf{Y}-E[\textbf{Y}])+ E[\textbf{X}]. \end{align} In the above equation, $\textbf{C}_\textbf{Y}$ is the covariance matrix of $\mathbf{Y}$, defined as \begin{align} \nonumber \textbf{C}_\textbf{Y}&=E[(\textbf{Y}-E\textbf{Y})(\textbf{Y}-E\textbf{Y})^{T}], \end{align} and $\textbf{C}_\textbf{XY}$ is the cross covariance matrix of $\mathbf{X}$ and $\mathbf{Y}$, defined as \begin{align} \nonumber \textbf{C}_\textbf{XY}=E[(\textbf{X}-E\textbf{X})(\textbf{Y}-E\textbf{Y})^T]. \end{align} The above calculations can easily be done using MATLAB or other packages. However, it is sometimes easier to use the orthogonality principle to find $\hat{\mathbf{X}}_L$. We now explain how to use the orthogonality principle to find linear MMSE estimators.

Using the Orthogonality Principle to Find Linear MMSE Estimators for Random Vectors:

Suppose that we are estimating a vector $\textbf{X}$: \begin{equation} \nonumber \textbf{X} = \begin{bmatrix} X_1 \\%[5pt] X_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] X_m \end{bmatrix} \end{equation} given that we have observed the random vector $\textbf{Y}$. Let \begin{equation} \nonumber \hat{\mathbf{X}}_L= \begin{bmatrix} \hat{X}_1 \\%[5pt] \hat{X}_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] \hat{X}_m \end{bmatrix} \end{equation} be the vector estimate. We define the MSE as \begin{align} \nonumber MSE=\sum_{k=1}^{m} E[(X_k-\hat{X}_k)^2]. \end{align} Therefore, to minimize the MSE, it suffices to minimize each $E[(X_k-\hat{X}_k)^2]$ individually. This means that we only need to discuss estimating a random variable $X$ given that we have observed the random vector $\textbf{Y}$. Since we would like our estimator to be linear, we can write \begin{align} \hat{X}_L=\sum_{k=1}^{n}a_k Y_k+b. \end{align} The error in our estimate $\tilde{X}$ is then given by \begin{align} \tilde{X}&=X-\hat{X}_L\\ &=X-\sum_{k=1}^{n}a_k Y_k-b. \end{align} Similar to the proof of Theorem 9.1, we can show that the linear MMSE should satisfy \begin{align} &E[\tilde{X}]=0,\\ &\textrm{Cov}(\tilde{X},Y_j)=E[\tilde{X} Y_j]=0, \quad \textrm{ for all }j=1,2,\cdots, n. \end{align} The above equations are called the orthogonality principle. The orthogonality principle is often stated as follows: The error ($\tilde{X}$) must be orthogonal to the observations ($Y_1$, $Y_2$, $\cdots$, $Y_n$). Note that there are $n+1$ unknowns ($a_1$, $a_2$, $\cdots$, $a_n$ and $b$) and $n+1$ equations. Let us look at an example to see how we can apply the orthogonality principle.

Example
Let $X$ be an unobserved random variable with $EX=0$, $\textrm{Var}(X)=4$. Assume that we have observed $Y_1$ and $Y_2$ given by \begin{align} Y_1&=X+W_1,\\ Y_2&=X+W_2, \end{align} where $EW_1=EW_2=0$, $\textrm{Var}(W_1)=1$, and $\textrm{Var}(W_2)=4$. Assume that $W_1$, $W_2$ , and $X$ are independent random variables. Find the linear MMSE estimator of $X$, given $Y_1$ and $Y_2$.
  • Solution
    • The linear MMSE of $X$ given $Y$ has the form \begin{align} \hat{X}_L=aY_1+bY_2+c. \end{align} We use the orthogonality principle. We have \begin{align} E[\tilde{X}]&=-aEY_1-bEY_2-c\\ &=-a \cdot 0- b \cdot 0-c=-c. \end{align} Using $E[\tilde{X}]=0$, we conclude $c=0$. Next, we note \begin{align} \textrm{Cov}(\hat{X}_L,Y_1) &= \textrm{Cov}(aY_1+bY_2,Y_1)\\ &=a \textrm{Cov}(Y_1, Y_1)+ b \textrm{Cov}(Y_1,Y_2)\\ &=a \textrm{Cov} (X+W_1,X+W_1)+b \textrm{Cov} (X+W_1,X+W_2)\\ &=a (\textrm{Var}(X) +\textrm{Var}(W_1))+b \textrm{Var}(X)\\ &=5a +4b. \end{align} Similarly, we find \begin{align} \textrm{Cov}(\hat{X}_L,Y_2) &= \textrm{Cov}(aY_1+bY_2,Y_2)\\ &=a \textrm{Var}(X) +b (\textrm{Var}(X)+\textrm{Var}(W_2))\\ &=4a +8b. \end{align} We need to have \begin{align} &\textrm{Cov}(\tilde{X},Y_j)=0, \quad \textrm{ for }j=1,2, \end{align} which is equivalent to \begin{align} &\textrm{Cov}(\hat{X}_L,Y_j)=\textrm{Cov}(X,Y_j), \quad \textrm{ for }j=1,2. \end{align} Since $\textrm{Cov}(X,Y_1)=\textrm{Cov}(X,Y_2)=\textrm{Var}(X)=4$, we conclude \begin{align} &5a +4b=4,\\ &4a +8b=4. \end{align} Solving for $a$ and $b$, we obtain $a=\frac{2}{3}$, and $b=\frac{1}{6}$. Therefore, the linear MMSE estimator of $X$, given $Y_1$ and $Y_2$, is \begin{align} \hat{X}_L=\frac{2}{3} Y_1+ \frac{1}{6} Y_2. \end{align}




The print version of the book is available on Amazon.

Book Cover


Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI

ractical Uncertaintly Cover