9.1.10 Solved Problems

Problem

Let $X \sim N(0,1)$. Suppose that we know \begin{align} Y \; | \; X=x \quad \sim \quad N(x,1). \end{align} Show that the posterior density of $X$ given $Y=y$, $f_{X|Y}(x|y)$, is given by \begin{align} X \; | \; Y=y \quad \sim \quad N\left(\frac{y}{2},\frac{1}{2}\right). \end{align}

  • Solution
    • Our goal is to show that $f_{X|Y}(x|y)$ is normal with mean $\frac{y}{2}$ and variance $\frac{1}{2}$. Therefore, it suffices to show that \begin{align} f_{X|Y}(x|y)=c(y) \exp \left\{-\left(x-\frac{y}{2}\right)^2\right\}, \end{align} where $c(y)$ is just a function of $y$. That is, for a given $y$, $c(y)$ is just the normalizing constant ensuring that $f_{X|Y}(x|y)$ integrates to one. By the assumptions, \begin{align} f_{Y|X}(y|x)=\frac{1}{\sqrt{2 \pi} } \exp \left\{-\frac{(y-x)^2}{2} \right\}, \end{align} \begin{align} f_{X}(x)=\frac{1}{\sqrt{2 \pi} } \exp \left\{-\frac{x^2}{2} \right\}. \end{align} Therefore, \begin{align} f_{X|Y}(x|y)&=\frac{f_{Y|X}(y|x)f_{X}(x)}{f_{Y}(y)}\\ &= (\textrm{a function of $y$}) \cdot f_{Y|X}(y|x)f_{X}(x)\\ &=(\textrm{a function of $y$}) \cdot \exp \left\{-\frac{(y-x)^2+x^2}{2}\right\}\\ &=(\textrm{a function of $y$}) \cdot \exp \left\{-\left(x-\frac{y}{2}\right)^2+\frac{y^2}{4}\right\}\\ &=(\textrm{a function of $y$}) \cdot \exp \left\{-\left(x-\frac{y}{2}\right)^2\right\}. \end{align}


Problem

We can generalize the result of Problem 9.1 using the same method. In particular, assuming \begin{align} X \sim N(\mu,\tau^2) \quad \textrm{and} \quad Y \; | \; X=x \quad \sim \quad N(x,\sigma^2), \end{align} it can be shown that the posterior density of $X$ given $Y=y$ is given by \begin{align} X \; | \; Y=y \quad \sim \quad N \left(\frac{ y / \sigma^2 + \mu / \tau^2 }{1 / \sigma^2 + 1 / \tau^2}, \frac{1}{1 / \sigma^2 + 1 / \tau^2} \right). \end{align} In this problem, you can use the above result. Let $X \sim N(\mu,\tau^2)$ and \begin{align} Y \; | \; X=x \quad \sim \quad N(x,\sigma^2). \end{align} Suppose that we have observed the random sample $Y_1$, $Y_2$, $\cdots$, $Y_n$ such that, given $X=x$, the $Y_i$'s are i.i.d. and have the same distribution as $Y \; | \; X=x$.

  1. Show that the posterior density of $X$ given $\overline{Y}$ (the sample mean) is \begin{align} X \; | \; \overline{Y} \quad \sim \quad N \left(\frac{ n \overline{Y} / \sigma^2 + \mu / \tau^2 }{n / \sigma^2 + 1 / \tau^2}, \frac{1}{n / \sigma^2 + 1 / \tau^2} \right). \end{align}
  2. Find the MAP and the MMSE estimates of $X$ given $\overline{Y}$.

  • Solution
      1. Since $Y \; | \; X=x \quad \sim \quad N(x,\sigma^2)$, we conclude \begin{align} \overline{Y} \; | \; X=x \quad \sim \quad N\left(x,\frac{\sigma^2}{n}\right). \end{align} Therefore, we can use the posterior density given in the problem statement (we need to replace $\sigma^2$ by $\frac{\sigma^2}{n}$). Thus, the posterior density of $X$ given $\overline{Y}$ is \begin{align} X \; | \; \overline{Y} \quad \sim \quad N \left(\frac{ n \overline{Y} / \sigma^2 + \mu / \tau^2 }{n / \sigma^2 + 1 / \tau^2}, \frac{1}{n / \sigma^2 + 1 / \tau^2} \right). \end{align}
      2. To find the MAP estimate of $X$ given $\overline{Y}$, we need to find the value that maximizes the posterior density. Since the posterior density is normal, the maximum value is obtained at the mean which is \begin{align} \hat{X}_{MAP}=\frac{ n \overline{Y} / \sigma^2 + \mu / \tau^2 }{n / \sigma^2 + 1 / \tau^2}. \end{align} Also, the MMSE estimate of $X$ given $\overline{Y}$ is \begin{align} \hat{X}_{M}=E[X|\overline{Y}]=\frac{ n \overline{Y} / \sigma^2 + \mu / \tau^2 }{n / \sigma^2 + 1 / \tau^2}. \end{align}


Problem

Let $\hat{X}_M$ be the MMSE estimate of $X$ given $Y$. Show that the MSE of this estimator is \begin{align} MSE=E\big[\textrm{Var}(X|Y)\big]. \end{align}

  • Solution
    • We have \begin{align} \textrm{Var}(X|Y)&=E[(X-E[X|Y])^2|Y] &\big(\textrm{by definition of }\textrm{Var}(X|Y)\big)\\ &=E[(X-\hat{X}_M)^2|Y]. \end{align} Therefore, \begin{align} E[\textrm{Var}(X|Y)]&=E\big[E[(X-\hat{X}_M)^2|Y]\big] \\ &=E[(X-\hat{X}_M)^2] & (\textrm{by the law of iterated expectations})\\ &=MSE & (\textrm{by definition of MSE}). \end{align}


Problem

Consider two random variables $X$ and $Y$ with the joint PMF given in Table 9.1.

Table 9.1: Joint PMF of $X$ and $Y$ for Problem 4
  $Y=0$ $Y=1$
$X=0$ $\frac{1}{5}$ $\frac{2}{5}$
$X=1$ $\frac{2}{5}$ $0$
  1. Find the linear MMSE estimator of $X$ given $Y$, ($\hat{X}_L$).
  2. Find the MMSE estimator of $X$ given $Y$, ($\hat{X}_M$).
  3. Find the MSE of $\hat{X}_M$.
  • Solution
    • Using the table we find out \begin{align}%\label{} \nonumber &P_X(0)=\frac{1}{5}+\frac{2}{5}=\frac{3}{5}, \\ \nonumber &P_X(1)=\frac{2}{5}+0=\frac{2}{5}, \\ \nonumber &P_Y(0)=\frac{1}{5}+\frac{2}{5}=\frac{3}{5}, \\ \nonumber &P_Y(1)=\frac{2}{5}+0=\frac{2}{5}. \end{align} Thus, the marginal distributions of $X$ and $Y$ are both $Bernoulli(\frac{2}{5})$. Therefore, we have \begin{align} &EX=EY=\frac{2}{5},\\ &\textrm{Var}(X)=\textrm{Var}(Y)=\frac{2}{5} \cdot \frac{3}{5}=\frac{6}{25}. \end{align}
      1. To find the linear MMSE estimator of $X$ given $Y$, we also need $\textrm{Cov}(X,Y)$. We have \begin{align} EXY= \sum {x_i y_j} P_{XY}(x,y)=0. \end{align} Therefore, \begin{align} \textrm{Cov}(X,Y)&=EXY-EXEY\\ &=-\frac{4}{25}. \end{align} The linear MMSE estimator of $X$ given $Y$ is \begin{align} \hat{X}_L&=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX\\ &=\frac{-4 / 25}{6 /25 } \left(Y-\frac{2}{5}\right)+ \frac{2}{5}\\ &=-\frac{2}{3}Y+\frac{2}{3}. \end{align} Since $Y$ can only take two values, we can summarize $\hat{X}_L$ in the following table.

        Table 9.2: The linear MMSE estimator of $X$ given $Y$ for Problem 4
          $Y=0$ $Y=1$
        $\hat{X}_L$ $\frac{2}{3}$ $0$

      2. To find the MMSE estimator of $X$ given $Y$, we need the conditional PMFs. We have \begin{align}%\label{} \nonumber &P_{X|Y}(0|0)=\frac{P_{XY}(0,0)}{P_{Y}(0)}\\ \nonumber &= \frac{\frac{1}{5}}{\frac{3}{5}}=\frac{1}{3}. \end{align} Thus, \begin{align}%\label{} \nonumber &P_{X|Y}(1|0)=1-\frac{1}{3}=\frac{2}{3}. \end{align} We conclude \begin{align}%\label{} \nonumber X|Y=0 \hspace{5pt} \sim \hspace{5pt} Bernoulli \left(\frac{2}{3}\right). \end{align} Similarly, we find \begin{align}%\label{} \nonumber &P_{X|Y}(0|1)=1,\\ \nonumber &P_{X|Y}(1|1)=0. \end{align} Thus, given $Y=1$, we have always $X=0$. The MMSE estimator of $X$ given $Y$ is \begin{align} \hat{X}_M&=E[X|Y]. \end{align} We have \begin{align} &E[X|Y=0]=\frac{2}{3},\\ &E[X|Y=1]=0. \end{align} Thus, we can summarize $\hat{X}_M$ in the following table.

        Table 9.3: The MMSE estimator of $X$ given $Y$ for Problem Problem 4
          $Y=0$ $Y=1$
        $\hat{X}_M$ $\frac{2}{3}$ $0$

        We notice that, for this problem, the MMSE and the linear MMSE estimators are the same. In fact, this is not surprising since here, $Y$ can only take two possible values, and for each value we have a corresponding MMSE estimator. The linear MMSE estimator is just the line passing through the two resulting points.
      3. The MSE of $\hat{X}_M$ can be obtained as \begin{align}%\label{} MSE &= E [\tilde{X}^2]\\ &=EX^2-E[\hat{X}^2_M]\\ &=\frac{2}{5}-E[\hat{X}^2_M]. \end{align} From the table for $\hat{X}_M$, we obtain $E[\hat{X}^2_M]=\frac{4}{15}$. Therefore, \begin{align}%\label{} MSE = \frac{2}{15}. \end{align} Note that here the MMSE and the linear MMSE estimators are equal, so they have the same MSE. Thus, we can use the formula for the MSE of $\hat{X}_L$ as well: \begin{align}%\label{} MSE &= (1-\rho(X,Y)^2) \textrm{Var}(X)\\ &=\left(1-\frac{\textrm{Cov}(X,Y)^2}{\textrm{Var}(X) \textrm{Var}(Y)}\right) \textrm{Var}(X)\\ &=\left(1-\frac{(-4/25)^2}{6 /25 \cdot 6 / 25}\right) \frac{6}{25}\\ &= \frac{2}{15}. \end{align}


Problem

Consider Example 9.9 in which $X$ is an unobserved random variable with $EX=0$, $\textrm{Var}(X)=4$. Assume that we have observed $Y_1$ and $Y_2$ given by \begin{align} Y_1&=X+W_1,\\ Y_2&=X+W_2, \end{align} where $EW_1=EW_2=0$, $\textrm{Var}(W_1)=1$, and $\textrm{Var}(W_2)=4$. Assume that $W_1$, $W_2$ , and $X$ are independent random variables. Find the linear MMSE estimator of $X$ given $Y_1$ and $Y_2$ using the vector formula \begin{align} \hat{\mathbf{X}}_L=\mathbf{\textbf{C}_\textbf{XY}} \mathbf{\textbf{C}_\textbf{Y}}^{-1} (\mathbf{Y}-E[\textbf{Y}])+ E[\textbf{X}]. \end{align}

  • Solution
    • Note that, here, $X$ is a one dimensional vector, and $\textbf{Y}$ is a two dimensional vector \begin{equation} \nonumber \textbf{Y} = \begin{bmatrix} Y_1 \\%[5pt] Y_2 \\%[5pt] \end{bmatrix}=\begin{bmatrix} X+W_1 \\%[5pt] X+W_2 \\%[5pt] \end{bmatrix}. \end{equation} We have \begin{equation} \nonumber \textbf{C}_\textbf{Y} = \begin{bmatrix} \textrm{Var} (Y_1) & \textrm{Cov}(Y_1,Y_2) \\%[5pt] \textrm{Cov}(Y_2,Y_1) & \textrm{Var}(Y_2) \end{bmatrix}=\begin{bmatrix} 5 & 4 \\%[5pt] 4 & 8 \end{bmatrix}, \end{equation} \begin{equation} \nonumber \textbf{C}_\textbf{XY} = \begin{bmatrix} \textrm{Cov} (X,Y_1) & \textrm{Cov}(X,Y_2) \\%[5pt] \end{bmatrix}=\begin{bmatrix} 4 & 4 \\%[5pt] \end{bmatrix}. \end{equation} Therefore, \begin{align} \hat{\mathbf{X}}_L&=\begin{bmatrix} 4 & 4 \\%[5pt] \end{bmatrix}\begin{bmatrix} 5 & 4 \\%[5pt] 4 & 8 \end{bmatrix}^{-1} \left(\begin{bmatrix} Y_1 \\%[5pt] Y_2 \\%[5pt] \end{bmatrix}-\begin{bmatrix} 0 \\%[5pt] 0 \\%[5pt] \end{bmatrix}\right)+ 0\\ &=\begin{bmatrix} \frac{2}{3} & \frac{1}{6} %[5pt] \end{bmatrix} \begin{bmatrix} Y_1 \\%[5pt] Y_2 \\%[5pt] \end{bmatrix}\\ &=\frac{2}{3} Y_1+ \frac{1}{6} Y_2, \end{align} which is the same as the result that we obtained using the orthogonality principle in Example 9.9.


Problem

Suppose that we need to decide between two opposing hypotheses $H_0$ and $H_1$. Let $C_{ij}$ be the cost of accepting $H_i$ given that $H_j$ is true. That is

$\quad$ $C_{00}$: The cost of choosing $H_0$, given that $H_0$ is true.

$\quad$ $C_{10}$: The cost of choosing $H_1$, given that $H_0$ is true.

$\quad$ $C_{01}$: The cost of choosing $H_0$, given that $H_1$ is true.

$\quad$ $C_{11}$: The cost of choosing $H_1$, given that $H_1$ is true.

It is reasonable to assume that the associated cost to a correct decision is less than the cost of an incorrect decision. That is, $c_{00} \lt c_{10}$ and $c_{11} \lt c_{01}$. The average cost can be written as \begin{align} C =&\sum_{i,j} C_{ij} P( \textrm{choose }H_i | H_j) P(H_j)\\ =& C_{00} P( \textrm{choose }H_0 | H_0) P(H_0)+ C_{01} P( \textrm{choose }H_0 | H_1) P(H_1) \\ & +C_{10} P( \textrm{choose }H_1 | H_0) P(H_0)+ C_{11} P( \textrm{choose }H_1 | H_1) P(H_1). \end{align} Our goal is to find the decision rule such that the average cost is minimized. Show that the decision rule can be stated as follows: Choose $H_0$ if and only if \begin{align} \label{eq:min-cost-test-gen} f_{Y}(y|H_0)P(H_0) (C_{10}-C_{00}) \geq f_{Y}(y|H_1)P(H_1)(C_{01}-C_{11}) \hspace{30pt} (9.8) \end{align}
  • Solution
    • First, note that \begin{align} P( \textrm{choose }H_0 | H_0)&=1- P( \textrm{choose }H_1 | H_0),\\ P( \textrm{choose }H_1 | H_1) &=1- P( \textrm{choose }H_0 | H_1). \end{align} Therefore, \begin{align} C =& C_{00} \big[1-P( \textrm{choose }H_1 | H_0)\big] P(H_0)+ C_{01} P( \textrm{choose }H_0 | H_1) P(H_1) \\ & +C_{10} P( \textrm{choose }H_1 | H_0) P(H_0)+ C_{11} \big[1-P( \textrm{choose }H_0 | H_1)\big] P(H_1)\\ =&(C_{10}-C_{00})P( \textrm{choose }H_1 | H_0) P(H_0)+(C_{01}-C_{11})P( \textrm{choose }H_0 | H_1) P(H_1)\\ &+C_{00}p(H_0)+C_{11}p(H_1). \end{align} The term $C_{00}p(H_0)+C_{11}P(H_1)$ is constant (i.e., it does not depend on the decision rule). Therefore, to minimize the cost, we need to minimize \begin{align} D =P( \textrm{choose }H_1 | H_0) P(H_0)(C_{10}-C_{00})+P( \textrm{choose }H_0 | H_1) P(H_1)(C_{01}-C_{11}). \end{align} The above expression is very similar to the average error probability of the MAP test (Equation 9.8). The only difference is that we have $p(H_0)(C_{10}-C_{00})$ instead of $P(H_0)$, and we have $p(H_1)(C_{01}-C_{11})$ instead of $P(H_1)$. Therefore, we can use a decision rule similar to the MAP decision rule. More specifically, we choose $H_0$ if and only if \begin{align} f_{Y}(y|H_0)P(H_0) (C_{10}-C_{00}) \geq f_{Y}(y|H_1)P(H_1)(C_{01}-C_{11}). \end{align}


Problem

Let \begin{align} X \sim N(0,4) \quad \textrm{and} \quad Y \; | \; X=x \quad \sim \quad N(x,1). \end{align} Suppose that we have observed the random sample $Y_1$, $Y_2$, $\cdots$, $Y_{25}$ such that, given $X=x$, the $Y_i$'s are i.i.d. and have the same distribution as $Y \; | \; X=x$. Find a $95\%$ credible interval for $X$, given that we have observed \begin{align}%\label{} \overline{Y}=\frac{Y_1+Y_2+...+Y_n}{n}=0.56 \end{align} Hint: Use the result of Problem 9.2.

  • Solution
    • By part (a) of Problem 9.2, we have \begin{align} X \; | \; \overline{Y} \quad &\sim \quad N \left(\frac{ 25 (0.56) / 1 + 0 / 4 }{25 / 1 + 1 / 4}, \frac{1}{25 / 1 + 1 / 4} \right)\\ &=N\big(0.5545, 0.0396\big). \end{align} Therefore, we choose the interval in the form of \begin{align} \big[0.5545-c, 0.5545+c\big]. \end{align} We need to have \begin{align} P \bigg(0.5545-c \leq X \leq 0.5545+c \big{|}\overline{Y}=0.56\bigg) &=\Phi \left( \frac{c}{\sqrt{0.0396}}\right)-\Phi \left( \frac{-c}{\sqrt{0.0396} }\right)\\ &=2 \Phi \left( \frac{c}{\sqrt{0.0396} }\right)-1=0.95 \end{align} Solving for $c$, we obtain \begin{align} c=\sqrt{0.0396} \Phi^{-1}(0.975) \approx 0.39 \end{align} Therefore, the $95\%$ credible interval for $X$ is \begin{align} \big[0.5545-0.39, 0.5545+0.39\big] \approx [0.1645,0.9445]. \end{align}




The print version of the book is available on Amazon.

Book Cover


Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI

ractical Uncertaintly Cover