What is the relationship between R-Squared and the correlation coefficient of a model?

1 Answer
Mar 25, 2018

See this . Credit to Gaurav Bansal.

Explanation:

I was trying to think of the best way to explain this and I stumbled across a page that does a really nice job. I would rather give this guy the credit for the explanation. In case the link doesn't work for some I have included some information below.

Simply stated: the #R^2# value is simply the square of the correlation coefficient #R#.

The correlation coefficient ( #R# ) of a model (say with variables #x# and #y#) takes values between #-1# and #1#. It describes how #x# and #y# are correlated.

  • If #x# and #y# are in perfect unison, then this value will be positive #1#
  • If #x# increases while #y# decreases in exactly the opposite manner, then this value will be #-1#
  • #0# would be a situation where there is no correlation between #x# and #y#

However, this #R# value is only useful for a simple linear model (just an #x# and #y#). Once we consider more than one independent variable (now we have #x_1#, #x_2#, ...), it is very hard to understand what the correlation coefficient means. Tracking which variable contributes what to the correlation is not so clear.

This is where the #R^2# value comes into play. It is simply the square of the correlation coefficient. It takes values between #0# and #1#, where values close to #1# imply more correlation (whether positively or negatively correlated) and #0# implies no correlation. Another way to think of it is as the fractional variation in the dependent variable that is the result of all of the independent variables. If the dependent variable is highly dependent on all of its independent variables, the value will be close to #1#. So #R^2# is much more useful as it can be used to describe multivariate models as well.

If you would like a discussion on some of the mathematical notions involved with relating the two values, see this .