End of Chapter Problems

8.6.0 End of Chapter Problems

Problem

Let $X$ be the weight of a randomly chosen individual from a population of adult men. In order to estimate the mean and variance of $X$, we observe a random sample $X_1$,$X_2$,$\cdots$,$X_{10}$. Thus, the $X_i$'s are i.i.d. and have the same distribution as $X$. We obtain the following values (in pounds): \begin{equation}%\label{} 165.5, \; 175.4, \; 144.1, \; 178.5, \; 168.0, \; 157.9, \; 170.1, \; 202.5, \; 145.5, \; 135.7 \end{equation} Find the values of the sample mean, the sample variance, and the sample standard deviation for the observed sample.

Problem

Let $X_1$, $X_2$, $X_3$, $...$, $X_n$ be a random sample with unknown mean $EX_i=\mu$, and unknown variance $\textrm{Var}(X_i)=\sigma^2$. Suppose that we would like to estimate $\theta=\mu^2$. We define the estimator $\hat{\Theta}$ as \begin{align}%\label{} \hat{\Theta}=\big(\overline{X}\big)^2= \left[\frac{1}{n} \sum_{k=1}^n X_k \right]^2 \end{align} to estimate $\theta$. Is $\hat{\Theta}$ an unbiased estimator of $\theta$? Why?

Problem

Let $X_1$, $X_2$, $X_3$, $...$, $X_n$ be a random sample from the following distribution \begin{align} \nonumber f_X(x) = \left\{ \begin{array}{l l} \theta \left(x-\frac{1}{2}\right)+1 & \quad \textrm{for }0 \leq x \leq 1 \\ & \quad \\ 0 & \quad \text{otherwise} \end{array} \right. \end{align} where $\theta \in [-2,2]$ is an unknown parameter. We define the estimator $\hat{\Theta}_n$ as \begin{align}%\label{} \hat{\Theta}_n=12 \overline{X}-6 \end{align} to estimate $\theta$.

Is $\hat{\Theta}_n$ an unbiased estimator of $\theta$?
Is $\hat{\Theta}_n$ a consistent estimator of $\theta$?
Find the mean squared error (MSE) of $\hat{\Theta}_n$.

Problem

Let $X_1, \dots, X_4$ be a random sample from a $Geometric(p)$ distribution. Suppose we observed $(x_1, x_2, x_3, x_4)$ = $(2, 3, 3, 5)$. Find the likelihood function using $P_{X_i}(x_i; p) = p(1-p)^{x_i-1}$ as the PMF.

Problem

Let $X_1, \dots, X_4$ be a random sample from an $Exponential(\theta)$ distribution. Suppose we observed $(x_1, x_2, x_3, x_4)$ = $(2.35, 1.55, 3.25, 2.65)$. Find the likelihood function using \begin{align} f_{X_i}(x_i; \theta) = \theta e^{-\theta x_i}, \quad \textrm{ for }x_i \geq 0 \end{align} as the PDF.

Problem

Often when working with maximum likelihood functions, out of ease we maximize the log-likelihood rather than the likelihood to find the maximum likelihood estimator. Why is maximizing $L(\mathbf{x}; \theta)$ as a function of $\theta$ equivalent to maximizing log $L(\mathbf{x}; \theta)$?

Problem

Let $X$ be one observation from a $N(0, \sigma^2)$ distribution.

Find an unbiased estimator of $\sigma^2$.
Find the log likelihood, log$(L(x; \sigma^2))$, using \begin{align} f_{X}(x; \sigma^2) = \frac{1}{\sqrt{2\pi}\sigma} exp \left\{-\frac{x^2}{2\sigma^2}\right\} \end{align} as the PDF.
Find the Maximum Likelihood Estimate (MLE) for the standard deviation $\sigma$, $\hat{\sigma}_{ML}$.

Problem

Let $X_1, \dots, X_n$ be a random sample from a $Poisson(\lambda)$ distribution.

Find the likelihood function, $L(x_1, \dots, x_n; \lambda)$, using \begin{align} P_{X_i}(x_i; \lambda) = \frac{e^{-\lambda}\lambda^{x_i}}{x_i!} \end{align} as the PMF.
Find the log likelihood function and use that to obtain the MLE for $\lambda$, $\hat{\lambda}_{ML}$.

Problem

In this problem, we would like to find the CDFs of the order statistics. Let $X_1, \dots, X_n$ be a random sample from a continuous distribution with CDF $F_X(x)$ and PDF $f_X(x)$. Define $X_{(1)}, \dots, X_{(n)}$ as the order statistics and show that \begin{align*} F_{X_{(i)}}(x)=\sum_{k=i}^{n} {n \choose k} \big[ F_X(x)\big]^{k} \big[1-F_X(x) \big]^{n-k}. \end{align*} Hint: Fix $x \in \mathbb{R}$. Let $Y$ be a random variable that counts the number of $X_j$'s $\leq x$. Define $\{X_j \leq x\}$ as a "success" and $\{X_j > x\}$ as a "failure," and show that $Y \sim Binomial(n,p=F_X(x))$.

Problem

In this problem, we would like to find the PDFs of order statistics. Let $X_1, \dots, X_n$ be a random sample from a continuous distribution with CDF $F_X(x)$ and PDF $f_X(x)$. Define $X_{(1)}, \dots, X_{(n)}$ as the order statistics. Our goal here is to show that \begin{align}%\label{} f_{X_{(i)}}(x)&=\frac{n!}{(i-1)!(n-i)!}f_X(x) \big[ F_X(x)\big]^{i-1} \big[1-F_X(x) \big]^{n-i}. \end{align} One way to do this is to differentiate the CDF (found in Problem 9). However, here, we would like to derive the PDF directly. Let $f_{X_{(i)}}(x)$ be the PDF of $X_{(i)}$. By definition of the PDF, for small $\delta$, we can write \begin{align}\label{} \nonumber f_{X_{(i)}}(x) \delta \approx P(x \leq X_{(i)} \leq x+\delta). \end{align} Note that the event $\{x \leq X_{(i)} \leq x+\delta \}$ occurs if $i-1$ of the $X_j$'s are less than $x$, one of them is in $[x,x+\delta]$, and $n-i$ of them are larger than $x+\delta$. Using this, find $f_{X_{(i)}}(x)$.

Hint: Remember the multinomial distribution. More specifically, suppose that an experiment has $3$ possible outcomes, so the sample space is given by \begin{align}\label{} \nonumber S=\{s_1,s_2, s_3\}. \end{align} Also, suppose that $P(s_i)=p_i$ for $i=1,2,3$. Then for $n=n_1+n_2+n_3$ independent trials of this experiment, the probability that each $s_i$ appears $n_i$ times is given by \begin{align}\label{} \nonumber {n \choose n_1,n_2,n_3}p_1^{n_1} p_2^{n_2} p_3^{n_3}=\frac{n!}{n_1! n_2! n_3!} p_1^{n_1} p_2^{n_2} p_3^{n_3}. \end{align}

Problem

A random sample $X_1$, $X_2$, $X_3$, $...$, $X_{100}$ is given from a distribution with known variance $\textrm{Var}(X_i)=81$. For the observed sample, the sample mean is $\overline{X}=50.1$. Find an approximate $95 \%$ confidence interval for $\theta=EX_i$.

Problem

To estimate the portion of voters who plan to vote for Candidate A in an election, a random sample of size $n$ from the voters is chosen. The sampling is done with replacement. Let $\theta$ be the portion of voters who plan to vote for Candidate A among all voters.

How large does $n$ need to be so that we can obtain a $90 \%$ confidence interval with $3 \%$ margin of error?
How large does $n$ need to be so that we can obtain a $99 \%$ confidence interval with $3 \%$ margin of error?

Problem

Let $X_1$, $X_2$, $X_3$, $...$, $X_{100}$ be a random sample from a distribution with unknown variance $\textrm{Var}(X_i)=\sigma^2 \lt \infty$. For the observed sample, the sample mean is $\overline{X}=110.5$, and the sample variance is $S^2=45.6$. Find a $95 \%$ confidence interval for $\theta=EX_i$.

Problem

A random sample $X_1$, $X_2$, $X_3$, $...$, $X_{36}$ is given from a normal distribution with unknown mean $\mu=EX_i$ and unknown variance $\textrm{Var}(X_i)=\sigma^2$. For the observed sample, the sample mean is $\overline{X}=35.8$, and the sample variance is $S^2=12.5$.

Find and compare $90 \%$, $95 \%$, and $99 \%$ confidence interval for $\mu$.
Find and compare $90 \%$, $95 \%$, and $99 \%$ confidence interval for $\sigma^2$.

Problem

Let $X_1$, $X_2$, $X_3$, $X_4$, $X_5$ be a random sample from a $N(\mu,1)$ distribution, where $\mu$ is unknown. Suppose that we have observed the following values \begin{equation} \; 5.45, \quad 4.23, \quad 7.22, \quad 6.94, \quad 5.98 \end{equation} We would like to decide between

$\quad$ $H_0$: $\mu=\mu_0=5$,

$\quad$ $H_1$: $\mu \neq 5$.

Define a test statistic to test the hypotheses and draw a conclusion assuming $\alpha=0.05$.
Find a $95\%$ confidence interval around $\overline{X}$. Is $\mu_0$ included in the interval? How does the exclusion of $\mu_0$ in the interval relate to the hypotheses we are testing?

Problem

Let $X_1,\dots, X_9$ be a random sample from a $N(\mu, 1)$ distribution, where $\mu$ is unknown. Suppose that we have observed the following values \begin{equation} \; 16.34, \quad 18.57, \quad 18.22, \quad 16.94, \quad 15.98, \quad 15.23, \quad 17.22, \quad 16.54, \quad 17.54 \end{equation}

We would like to decide between

$\quad$ $H_0$: $\mu=\mu_0=16$,

$\quad$ $H_1$: $\mu \neq 16$.

Find a $90\%$ confidence interval around $\overline{X}$. Is $\mu_0$ included in the interval? How does this relate to our hypothesis test?
Define a test statistic to test the hypotheses and draw a conclusion assuming $\alpha=0.1$.

Problem

Let $X_1$, $X_2$ ,..., $X_{150}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be \begin{equation} \overline{X}=52.28, \quad S^2=30.9 \end{equation} Design a level $0.05$ test to choose between

$\quad$ $H_0$: $\mu=50$,

$\quad$ $H_1$: $\mu > 50$.

Do you accept or reject $H_0$?

Problem

Let $X_1$, $X_2$, $X_3$, $X_4$, $X_5$ be a random sample from a $N(\mu,\sigma^2)$ distribution, where $\mu$ and $\sigma$ are both unknown. Suppose that we have observed the following values \begin{equation} \; 27.72, \quad 22.24, \quad 32.86, \quad 19.66, \quad 35.34 \end{equation} We would like to decide between

$\quad$ $H_0$: $\mu \geq 30$,

$\quad$ $H_1$: $\mu \lt 30$.

Assuming $\alpha=0.05$, what do you conclude?

Problem

Let $X_1$, $X_2$ ,..., $X_{121}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be \begin{equation} \overline{X}= 29.25, \quad S^2=20.7 \end{equation} Design a test to decide between

$\quad$ $H_0$: $\mu=30$,

$\quad$ $H_1$: $\mu \lt 30$,

and calculate the $P$-value for the observed data.

Problem

Suppose we would like to test the hypothesis that at least 10% of students suffer from allergies. We collect a random sample of 225 students and 21 of them suffer from allergies.

State the null and alternative hypotheses.
Obtain a test statistic and a $P$-value.
State the conclusion at the $\alpha = 0.05$ level.

Problem

Consider the following observed values of $(x_i,y_i)$: \begin{equation} (-5,-2), \quad (-3,1), \quad (0,4), \quad (2,6), \quad (1,3). \end{equation}

Find the estimated regression line \begin{align} \hat{y} = \hat{\beta_0}+\hat{\beta_1} x \end{align} based on the observed data.
For each $x_i$, compute the fitted value of $y_i$ using \begin{align} \hat{y}_i = \hat{\beta_0}+\hat{\beta_1} x_i. \end{align}
Compute the residuals, $e_i=y_i-\hat{y}_i$.
Calculate $R$-squared.

Problem

Consider the following observed values of $(x_i,y_i)$: \begin{equation} (1,3), \quad (3,7). \end{equation}

Find the estimated regression line \begin{align} \hat{y} = \hat{\beta_0}+\hat{\beta_1} x \end{align} based on the observed data.
For each $x_i$, compute the fitted value of $y_i$ using \begin{align} \hat{y}_i = \hat{\beta_0}+\hat{\beta_1} x_i. \end{align}
Compute the residuals, $e_i=y_i-\hat{y}_i$.
Calculate $R$-squared.
Explain the above results. In particular, can you conclude that the obtained regression line is a good model here?

Problem

Consider the simple linear regression model \begin{align} Y_i = \beta_0+\beta_1 x_i +\epsilon_i, \end{align} where $\epsilon_i$'s are independent $N(0,\sigma^2)$ random variables. Therefore, $Y_i$ is a normal random variable with mean $\beta_0+\beta_1 x_i$ and variance $\sigma^2$. Moreover, $Y_i$'s are independent. As usual, we have the observed data pairs $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$ from which we would like to estimate $\beta_0$ and $\beta_1$. In this chapter, we found the following estimators \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}},\\ &\hat{\beta_0}=\overline{Y}-\hat{\beta_1} \overline{x}. \end{align} where \begin{align} &s_{xx}=\sum_{i=1}^n (x_i-\overline{x})^2,\\ &s_{xy}=\sum_{i=1}^{n} (x_i-\overline{x})(Y_i-\overline{Y}). \end{align}

Show that $\hat{\beta_1}$ is a normal random variable.
Show that $\hat{\beta_1}$ is an unbiased estimator of $\beta_1$, i.e., \begin{align*} E[\hat{\beta_1}]= \beta_1. \end{align*}
Show that \begin{align*} \textrm{Var}(\hat{\beta_1})= \frac{\sigma^2}{s_{xx}}. \end{align*}

Problem

Again consider the simple linear regression model \begin{align} Y_i = \beta_0+\beta_1 x_i +\epsilon_i, \end{align} where $\epsilon_i$'s are independent $N(0,\sigma^2)$ random variables, and \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}},\\ &\hat{\beta_0}=\overline{Y}-\hat{\beta_1} \overline{x}. \end{align}

Show that $\hat{\beta_0}$ is a normal random variable.
Show that $\hat{\beta_0}$ is an unbiased estimator of $\beta_0$, i.e., \begin{align*} E[\hat{\beta_0}]= \beta_0. \end{align*}
For any $i=1,2,3,...,n$, show that \begin{align*} \textrm{Cov}(\hat{\beta_1},Y_i)=\frac{x_i-\overline{x}}{s_{xx}} \sigma^2. \end{align*}
Show that \begin{align*} \textrm{Cov}(\hat{\beta_1},\overline{Y})=0. \end{align*}
Show that \begin{align*} \textrm{Var}(\hat{\beta_0})=\frac{\sum_{i=1}^{n} x_i^2}{n s_{xx}} \sigma^2. \end{align*}

← previous

The print version of the book is available on Amazon.

Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI