Why doesn't an R-Squared value indicate anything about causation?

1 Answer
Oct 18, 2016

An R-squared indicates how well the observed data fits the expected data but it only gives you information about correlation.


An R-squared value indicates how well your observed data, or the data you collected, fits an expected trend. This value tells you the strength of the relationship but, like all statistical tests, there is nothing given that tells you the cause behind the relationship or its strength.

In the example below, we can see the graph on the left has no relationship, as indicated by low R-squared value. The graph on the right has a very strong relationship, as indicating by the R-squared value of 1. In none of these graphs can we tell what is ultimately causing this relationship.


Correlation does not mean causation. Your X values may very well affect your Y values, but other factors may be at play or the relationship could be due to chance. You can infer causation, but this is your interpretation and it cannot be proven by statistical testing. Having a high R-squared value still only tells you the strength of the relationship but not its cause.

To prove causation is a very large task. If you want to understand causation, your best bet is through experiments.