# Why does heteroskedasticity distort the results of a regression analysis?

May 15, 2018

Because regression analysis relies on the assumption that the residuals are all from the same normal distribution (with the same variance), and evidence of heteroskedasticity shows this assumption is not valid.

#### Explanation:

Heteroskedasticity (meaning "unlike variance") is the circumstance where the variance of the response variable changes, depending on the value of the input variable.

If there is evidence of heteroskedasticity in the model, then for areas where residual variance is low, $x$ will do a better job of predicting $y$ than for areas where residual variance is high.

Heteroskedasticity also affects the validity of confidence intervals (C.I.'s) for such predicted values, since the C.I. formula for any predicted value uses the statistic ${s}^{2}$ (which is an estimate of the model's ${\sigma}^{2}$), and ${s}^{2} = \frac{\setminus \sum {\left({y}_{i} - \overline{y}\right)}^{2}}{n - 1}$ is the same for the whole model. But if the true value of ${\sigma}^{2}$ itself varies throughout the model, then the bounds of the confidence interval should vary in correspondence with the fluctuation of ${\sigma}^{2}$.

Tests like Bartlett's test for heteroskedasticity can be performed to validate the assumption of equal variance across the sample. If the test shows significant departure from normality, then the computed estimates of the regression coefficients $\left({\beta}_{i} \text{'s}\right)$ are questionable, and a linear regression of $y$ on $x$ may not be the best model to predict $y$ from a chosen $x$. In this case, a transformation of some sort (like regressing $y$ on log x) may remove the heteroskedasticity.