8.5.3 The Method of Least Squares

Here, we use a different method to estimate $\beta_0$ and $\beta_1$. This method will result in the same estimates as before; however, it is based on a different idea. Suppose that we have data points $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$. Consider the model \begin{align} \hat{y} = \beta_0+\beta_1 x. \end{align} The errors (residuals) are given by \begin{align} e_i=y_i-\hat{y}_i=y_i-\beta_0-\beta_1 x_i. \end{align} The sum of the squared errors is given by \begin{align}\label{eq:reg-ls} g(\beta_0, \beta_1)=\sum_{i=1}^{n} e_i^2=\sum_{i=1}^{n} (y_i-\beta_0-\beta_1 x_i)^2. \hspace{30pt} (8.7) \end{align} To find the best fit for the data, we find the values of $\hat{\beta_0}$ and $\hat{\beta_1}$ such that $g(\beta_0, \beta_1)$ is minimized. This can be done by taking partial derivatives with respect to $\beta_0$ and $\beta_1$, and setting them to zero. We obtain \begin{align} \frac{\partial g}{\partial \beta_0}&=\sum_{i=1}^{n} 2(-1)(y_i-\beta_0-\beta_1 x_i)=0, \hspace{30pt} (8.8)\\ \frac{\partial g}{\partial \beta_1}&=\sum_{i=1}^{n} 2(-x_i)(y_i-\beta_0-\beta_1 x_i)=0. \hspace{30pt} (8.9) \end{align} By solving the above equations, we obtain the same values of $\hat{\beta_0}$ and $\hat{\beta_1}$ as before \begin{align} &\hat{\beta_1}=\frac{s_{xy}}{s_{xx}},\\ &\hat{\beta_0}=\overline{y}-\hat{\beta_1} \overline{x}, \end{align} where \begin{align} &s_{xx}=\sum_{i=1}^n (x_i-\overline{x})^2,\\ &s_{xy}=\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y}). \end{align} This method is called the method of least squares, and for this reason, we call the above values of $\hat{\beta_0}$ and $\hat{\beta_1}$ the least squares estimates of $\beta_0$ and $\beta_1$.


The print version of the book is available on Amazon.

Book Cover


Practical uncertainty: Useful Ideas in Decision-Making, Risk, Randomness, & AI

ractical Uncertaintly Cover