Chapter 11: Regression Analysis: Multiple Linear Regression
Overfitting and Multicollinearity
With the inclusion of more than one predictor variable in the regression model, there are some additional considerations that need to be taken into account. One of these considerations is the consequences of overfitting your regression model.
#\phantom{0}#
Overfitting
Adding more variables to a regression model does not necessarily mean that the model becomes better. In fact, it can make the model worse. This is called overfitting.
The danger of overfitting is that the regression model becomes tailored to fit the specific sample used to construct the model. While adding more variables could increase the predictive power of the model in regard to the sample, this may very well come at the cost of reduced predictive power with respect to the general population.
Consequently, an overfit model may lead to misleading regression coefficients, #p#-values, and #R^2#-values.
#\phantom{0}#
Another thing to watch out for when performing a Multiple Regression Analysis is multicollinearity.
#\phantom{0}#
Multicollinearity
Multicollinearity occurs when two or more of the predictor variables in the regression model are (substantially) correlated with each other.
Although multicollinearity does not reduce the predictive power of a regression model as a whole, it does reduce the accuracy of the individual partial regression coefficients (#b_1 \ldots b_n#).
If two predictor variables (e.g. #X_1# and #X_2#) are highly correlated, then the partial regression coefficients associated with them (#b_1# and #b_2#) may not accurately reflect the relationship between #Y# and #X_1# or the relationship between #Y# and #X_2# that exists in the population.
Or visit omptest.org if jou are taking an OMPT exam.