The First Method for Finding $\beta_0$ and $\beta

8.5.2 The First Method for Finding $\beta_0$ and $\beta_1$

Here, we assume that $x_i$'s are observed values of a random variable $X$. Therefore, we can summarize our model as \begin{align} Y = \beta_0+\beta_1 X +\epsilon, \end{align} where $\epsilon$ is a $N(0,\sigma^2)$ random variable independent of $X$. First, we take expectation from both sides to obtain \begin{align} %\label{eq:reg-E} EY &= \beta_0+\beta_1 EX +E[\epsilon]\\ &=\beta_0+\beta_1 EX \end{align} Thus, \begin{align} %\label{eq:reg-E} \beta_0=EY-\beta_1 EX. \end{align} Next, we look at $\textrm{Cov}(X,Y)$, \begin{align} \textrm{Cov}(X,Y) &= \textrm{Cov}(X,\beta_0+\beta_1 X +\epsilon)\\ &=\beta_0 \textrm{Cov}(X,1)+\beta_1\textrm{Cov}(X,X)+\textrm{Cov}(X, \epsilon)\\ &=0+\beta_1 \textrm{Cov}(X,X)+0 \quad (\textrm{since $X$ and $\epsilon$ are independent})\\ &=\beta_1 \textrm{Var}(X). \end{align} Therefore, we obtain \begin{align} \beta_1=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(X)}, \quad \beta_0=EY-\beta_1 EX. \end{align} Now, we can find $\beta_0$ and $\beta_1$ if we know $EX$, $EY$, $\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(X)}$. Here, we have the observed pairs $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$, so we may estimate these quantities. More specifically, we define \begin{align} &\overline{x}=\frac{x_1+x_2+...+x_n}{n},\\ &\overline{y}=\frac{y_1+y_2+...+y_n}{n},\\ &s_{xx}=\sum_{i=1}^n (x_i-\overline{x})^2,\\ &s_{xy}=\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y}). \end{align} We can then estimate $\beta_0$ and $\beta_1$ as \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}},\\ &\hat{\beta_0}=\overline{y}-\hat{\beta_1} \overline{x}. \end{align} The above formulas give us the regression line \begin{align} \hat{y} = \hat{\beta_0}+\hat{\beta_1} x. \end{align} For each $x_i$, the fitted value $\hat{y}_i$ is obtained by \begin{align} \hat{y}_i = \hat{\beta_0}+\hat{\beta_1} x_i. \end{align} Here, $\hat{y}_i$ is the predicted value of $y_i$ using the regression formula. The errors in this prediction are given by \begin{align} e_i=y_i-\hat{y}_i, \end{align} which are called the residuals.

Simple Linear Regression
Given the observations $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$, we can write the regression line as \begin{align} \hat{y} = \beta_0+\beta_1 x. \end{align} We can estimate $\beta_0$ and $\beta_1$ as \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}},\\ &\hat{\beta_0}=\overline{y}-\hat{\beta_1} \overline{x}, \end{align} where \begin{align} &s_{xx}=\sum_{i=1}^n (x_i-\overline{x})^2,\\ &s_{xy}=\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y}). \end{align} For each $x_i$, the fitted value $\hat{y}_i$ is obtained by \begin{align} \hat{y}_i = \hat{\beta_0}+\hat{\beta_1} x_i. \end{align} The quantities \begin{align} e_i=y_i-\hat{y}_i \end{align} are called the residuals.

Example
Consider the following observed values of $(x_i,y_i)$: \begin{equation} (1,3) \quad (2,4) \quad (3,8) \quad (4,9) \end{equation}

Find the estimated regression line \begin{align} \hat{y} = \hat{\beta_0}+\hat{\beta_1} x, \end{align} based on the observed data.
For each $x_i$, compute the fitted value of $y_i$ using \begin{align} \hat{y}_i = \hat{\beta_0}+\hat{\beta_1} x_i. \end{align}
Compute the residuals, $e_i=y_i-\hat{y}_i$ and note that \begin{align} \sum_{i=1}^{4} e_i =0. \end{align}

Solution
- 1. We have \begin{align} &\overline{x}=\frac{1+2+3+4}{4}=2.5,\\ &\overline{y}=\frac{3+4+8+9}{4}=6,\\ &s_{xx}=(1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2=5,\\ &s_{xy}=(1-2.5)(3-6)+(2-2.5)(4-6)+(3-2.5)(8-6)+(4-2.5)(9-6)=11. \end{align} Therefore, we obtain \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}}=\frac{11}{5}=2.2 \\ &\hat{\beta_0}=6-(2.2) (2.5)=0.5 \end{align}
  2. The fitted values are given by \begin{align} \hat{y}_i = 0.5+2.2 x_i, \end{align} so we obtain \begin{align} \hat{y}_1 =2.7, \quad \hat{y}_2 =4.9, \quad \hat{y}_3 =7.1, \quad \hat{y}_4 =9.3 \end{align}
  3. We have \begin{align} &e_1=y_1-\hat{y}_1=3-2.7=0.3,\\ &e_2=y_2-\hat{y}_2=4-4.9=-0.9,\\ &e_3=y_3-\hat{y}_3=8-7.1=0.9,\\ &e_4=y_4-\hat{y}_4=9-9.3=-0.3 \end{align} So, $e_1+e_2+e_3+e_4=0$.

We can use MATLAB or other software packages to do regression analysis. For example, the following MATLAB code can be used to obtain the estimated regression line in Example 8.31.

x=[1;2;3;4];
x0=ones(size(x));
y=[3;4;8;9];
beta = regress(y,[x0,x]);

Coefficient of Determination ($R$-Squared):

Let's look again at the above model for regression. We wrote \begin{align} Y = \beta_0+\beta_1 X +\epsilon, \end{align} where $\epsilon$ is a $N(0,\sigma^2)$ random variable independent of $X$. Note that, here, $X$ is the only variable that we observe, so we estimate $Y$ using $X$. That is, we can write \begin{align} \hat{Y} = \beta_0+\beta_1 X. \end{align} The error in our estimate is \begin{align} Y-\hat{Y}=\epsilon. \end{align} Note that the randomness in $Y$ comes from two sources: $X$ and $\epsilon$. More specifically, if we look at $\textrm{Var}(Y)$, we can write \begin{align} \textrm{Var}(Y) &= \beta_1^2 \textrm{Var}(X) +\textrm{Var}(\epsilon) \quad (\textrm{since $X$ and $\epsilon$ are assumed to be independent}). \end{align} The above equation can be interpreted as follows. The total variation in $Y$ can be divided into two parts. The first part, $\beta_1^2 \textrm{Var}(X)$, is due to variation in $X$. The second part, $\textrm{Var}(\epsilon)$, is the variance of error. In other words, $\textrm{Var}(\epsilon)$ is the variance left in $Y$ after we know $X$. If the variance of error, $\textrm{Var}(\epsilon)$, is small, then $Y$ is close to $\hat{Y}$, so our regression model will be successful in estimating $Y$. From the above discussion, we can define \begin{align} \rho^2=\frac{\beta_1^2 \textrm{Var}(X)}{\textrm{Var}(Y)} \end{align} as the portion of variance of $Y$ that is explained by variation in $X$. From the above discussion, we can also conclude that $0 \leq \rho^2 \leq 1$. More specifically, if $\rho^2$ is close to $1$, $Y$ can be estimated very well as a linear function of $X$. On the other hand if $\rho^2$ is small, then the variance of error is large and $Y$ cannot be accurately estimated as a linear function of $X$. Since $\beta_1=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(X)}$, we can write \begin{align} \label{eq:rho-reg} \rho^2=\frac{\beta_1^2 \textrm{Var}(X)}{\textrm{Var}(Y)}=\frac{\left[\textrm{Cov}(X,Y)\right]^2}{\textrm{Var}(X) \textrm{Var}(Y)} \hspace{30pt} (8.6) \end{align} The above equation should look familiar to you. Here, $\rho$ is the correlation coefficient that we have seen before. Here, we are basically saying that if $X$ and $Y$ are highly correlated (i.e., $\rho(X,Y)$ is large), then $Y$ can be well approximated by a linear function of $X$, i.e., $Y \approx \hat{Y}=\beta_0+\beta_1 X$.

We conclude that $\rho^2$ is an indicator showing the strength of our regression model in estimating (predicting) $Y$ from $X$. In practice, we often do not have $\rho$ but we have the observed pairs $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$. We can estimate $\rho^2$ from the observed data. We show it by $r^2$ and call it $R$-squared or coefficient of determination.

Coefficient of Determination
For the observed data pairs, $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$, we define coefficient of determination, $r^2$ as \begin{align} r^2=\frac{s_{xy}^2}{s_{xx}s_{yy}}, \end{align} where \begin{align} &s_{xx}=\sum_{i=1}^n (x_i-\overline{x})^2, \quad s_{yy}=\sum_{i=1}^n (y_i-\overline{y})^2, \quad s_{xy}=\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y}). \end{align} We have $0 \leq r^2 \leq 1$. Larger values of $r^2$ generally suggest that our linear model \begin{align} \hat{y_i}=\hat{\beta_0}+\hat{\beta_1}x_i \end{align} is a good fit for the data.

Two sets of data pairs are shown in Figure 8.12. In both data sets, the values of the $y_i$'s (the heights of the data points) have considerable variation. The data points shown in (a) are very close to the regression line. Therefore, most of the variation in $y$ is explained by the regression formula. That is, here, the $\hat{y_i}$'s are relatively close to the $y_i$'s, so $r^2$ is close to $1$. On the other hand, for the data shown in (b), a lot of variation in $y$ is left unexplained by the regression model. Therefore, $r^2$ for this data set is much smaller than $r^2$ for the data set in (a).

alpha-color — Figure 8.12 - The data in (a) results in a high value of $r^2$, while the data shown in (b) results in a low value of $r^2$.

Example
For the data in Example 8.31, find the coefficient of determination.

Solution
- In Example Example 8.31, we found \begin{align} &s_{xx}=5, \quad s_{xy}=11. \end{align} We also have \begin{align} &s_{yy}=(3-6)^2+(4-6)^2+(8-6)^2+(9-6)^2=26. \end{align} We conclude \begin{align} r^2=\frac{11^2}{5 \times 26} \approx 0.93 \end{align}

← previous

The print version of the book is available on Amazon.

Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI