Multicollinearity


Many economic variables have the property that they are correlated. This is not suprising, given the natural links between almost all facets of economic activity within any given economy. However, this feautre of most economic data suggests that within the context of regression, not only are the regressors (or independent or explanatory variables) related to the dependent variable in a regression model (which is what we want, as we're trying to "explain" our dependent variable using our independent variables), but the independent variables are also correlated with one another. When the independent variables are correlated with one another (and this can be checked by simply running a regression of one of the independent variables on another, say, and checking whether the multiple coefficient of determination from this regression is high), then we have what is termed "multicollinearity". If the multicollinearity is severe (i.e. if the R squared value from the simple regression just mentioned is close to unity), then the precision of the estimated slope coefficients in the model is very poor. As precision is simply variance, high multicollinearity implies high slope estimator variance. This in turn implies low t-statistics, since the denominator of the standard t-statistic(say H0: beta = 0, then t= beta hat / SE beta hat, where SE is the standard error of the estimator of the slope coefficient - which is reported in any computer output from a program which does regression). These low t-statistics in turn may result in a failure to reject the null hypothesis that some particular slope coefficient is zero, in turn implies that the particular independent variable is not a useful explanatory variable. However, the variable may actually have a lot of explanatory power, and we may simply be fooled into beleiving the variable is irrelevant because we observe low t-statistics which are simply an artifact of the multicollinearity in our regression model. This problem is often signaled when our regression has a high R squared value, but very low slope coefficient t-statistics. On simple remedy is to omit one of the variables that is highly multicollinear, as the informational content of this variable is essentially the same as that of other variable(s), anyways. Another common solution is to difference of log difference the data. This often removes much of the multicollinearity among regressors, particularly since the multicollinearity may have arisen because the regressors were all trending upwards over time, say, which is then the same problem as discussed in the previous note on nonstationarity.
It should be noted that if multicollinearity is particularly severe, then LS regression may not even work, and you may get an error message when attempting to run LS. In order to diagnose this problem, simply discard one of your regressors, and attempt agian to run the regression.