Why is it possible for a model with a higher R-Squared value to be less accurate?

May 20, 2016

It is an approximation, as are all statistics. It is a derived form, not the actual data.

Explanation:

Good question! Statistics are very useful and powerful, but also very easy to misuse or misunderstand. A key word is “model”. It is NOT the reality, but a simplification of our observations. A model may “fit” a limited set of actual data without really being the correct model (set of equations) for the physical reality.

A classic example is using a polynomial model for a set of data. Mathematically, you can “map” each data point to a model point, resulting in a perfect correlation, or R-squared term. HOWEVER, the resulting equation is NOT a “model” of a system, as it usually can neither interpolate or extrapolate any other data values.

“Regression” statistics are useful to show how close your approximation is to the actual data. Do NOT use them as a justification for acceptance, but rather as a warning of the degree of inherent error represented by the linearized function.

It is also extremely important to remember that interpolation includes the inherent error, but extrapolation (prediction) from a linear regression increases error dramatically as it moves further away from the actual data set.