Can a characteristic of a data set make a linear regression model unusable?

1 Answer
Nov 10, 2015

Non normality and/or heterogeneity of the data

Explanation:

When you want to apply a model, you start, in general, to apply a linear model. Once you've done model selection (choose the best one for what you want to do), you have to validate your model.

For this, first make a quantile-quantile plot. If the residuals follow a linear pattern (as below), you can assume the normality of your data.

enter image source here

Second, make the plot of the residuals vs the fitted values. If you see a pattern or a cone shape of the residuals, you have either a non-linear effect of one of your variable or heterogeneity. On the graph below, we can assume homogeneity.

enter image source here

Plot the residuals vs each variable to investigate the same pattern/cone shape as mentioned above.

If you have heterogeneity you can try a generalized linear model (glm). If you find a non-linear pattern, you can try a generalize additive model (gam).