Multicollinearity
Many economic variables have the property that they are correlated. This
is not suprising, given the natural links between almost all facets of
economic activity within any given economy. However, this feautre of
most economic data suggests that within the context of regression, not only are
the regressors (or independent or explanatory variables) related
to the dependent variable in a regression model (which is what we want, as
we're trying to "explain" our dependent variable using our independent
variables), but the independent variables are also correlated with one another.
When the independent variables are correlated with one another (and this can be checked
by simply running a regression of one of the independent variables on another, say, and checking whether
the multiple coefficient of determination from this regression is high), then
we have what is termed "multicollinearity". If the multicollinearity is severe
(i.e. if the R squared value from the simple regression just mentioned is close
to unity), then the precision of the estimated slope coefficients in the model
is very poor. As precision is simply variance, high multicollinearity
implies high slope estimator variance. This in turn implies low t-statistics, since
the denominator of the standard t-statistic(say H0: beta = 0, then t= beta hat / SE beta hat, where
SE is the standard error of the estimator of the slope coefficient - which is reported
in any computer output from a program which does regression). These low t-statistics
in turn may result in a failure to reject the null hypothesis that some
particular slope coefficient is zero, in turn implies that the particular
independent variable is not a useful explanatory variable. However, the variable may
actually have a lot of explanatory power, and we may simply be fooled into
beleiving the variable is irrelevant because we observe low t-statistics which are simply
an artifact of the multicollinearity in our regression model. This problem is often
signaled when our regression has a high R squared value, but very low slope
coefficient t-statistics. On simple remedy is to omit one of the variables that is
highly multicollinear, as the informational content of this variable is essentially the
same as that of other variable(s), anyways. Another common solution is to difference of
log difference the data. This often removes much of the multicollinearity among regressors,
particularly since the multicollinearity may have arisen because the regressors
were all trending upwards over time, say, which is then the same problem as discussed
in the previous note on nonstationarity.
It should be noted that if multicollinearity is particularly severe, then LS regression
may not even work, and you may get an error message when attempting to run LS.
In order to diagnose this problem, simply discard one of your regressors, and attempt
agian to run the regression.