Chapter 11: Regression Analysis: Simple Linear Regression
Residuals and Total Squared Error
A regression line is the best-fitting straight line through a set of data points. Regression analysis is all about predicting values, and what makes a regression line 'best-fitting' is that it has the lowest possible amount of prediction error.
In the context of regression, the amount of prediction error is expressed in terms of residuals.
#\phantom{0}#
Residual
A residual is the vertical distance between the regression line and a data point and is denoted by #r#.
Calculating Residuals
To calculate a residual, take a point #(X,Y)# from the data and determine the height of the regression line at point #X#. This point is the predicted value of #Y# and is denoted by #\hat{Y}#.
Next, subtract the predicted value #\hat{Y}# from the observed value #Y# to determine the value of the residual:
\[r_i = Y_i - \hat{Y}_i\]
Calculation of Residuals
Consider the regression equation #\hat{Y}=2X# and the data points #(1,3)#, #(3,1)#, and #(4,3)#. The residuals of these three data points are calculated as follows:
- For the first point #(1,3)#:
- #\purple{\hat{Y}_1}=2\cdot 1=2#
- #\blue{Y_1} = 3#
- #\orange{r_1}= Y_1-\hat{Y}_1=3-2=1#.
- For the second point #(3,1)#:
- #\purple{\hat{Y}_2}=2\cdot 3=6#
- #\blue{Y_2}=1#
- #\orange{r_2}= Y_2-\hat{Y}_2 = 1-6 =-5#.
- For the last point #(3,3)#:
- #\purple{\hat{Y}_3} =2\cdot 4=8#
- #\blue{Y_3}=3#
- #\orange{r_3}= Y_3-\hat{Y}_3 = 3-8=-5#.
#\phantom{0}#
#\phantom{0}#
One of the most commonly used measures to summarize the total amount of prediction error is the Total Squared Error.
#\phantom{0}#
Total Squared Error
The Total Squared Error is the sum of the squared residuals and is often abbreviated TSE.
\[\text{TSE} = \sum{r^2} = \sum{(Y-\hat{Y})^2}\]
The reason for squaring the residuals before adding them together is to prevent positive and negative residuals from canceling one another. Consequently, the total squared error will always be a positive number.
Calculation of Total Squared Error
Consider the regression line and residuals from the previous example. In this case, the Total Squared Error is:
\[\begin{array}{rcl}
\text{TSE} &=& \sum{(Y-\hat{Y})^2}\\
&=& (Y_1-\hat{Y}_1)^2 + (Y_2-\hat{Y}_2)^2 + (Y_3-\hat{Y}_3)^2\\
&=& (3-2)^2+(1-6)^2+(3-8)^2\\
&=& 1^2 + (-5)^2 + (-5)^2\\
&=& 1 + 25 + 25\\
&=& 51
\end{array}\]
Or visit omptest.org if jou are taking an OMPT exam.