Correlation and Coefficient of Determination

Add yours
Finding and Interpreting the Coefficient of Determination
3:38 — by Diane R Koenig

Tip: This isn't the place to ask a question because the teacher can't reply.

Key Questions

  • The correlation is some number between -1 and 1.

    If two variables are in perfect 100% correlation the number comes out as 1. An example would be the length of a 2x4 bar and its weight.

    If there is absolutely no correlation between the two variables, the number comes out as 0. An example should be the number of stork's nests and the birth rate.

    Usually you will find some correlation between two variables. This does not mean there is a cause-and-effect relation.

    At last, there can also be a negative correlation between variables, where the higher the one, the lower the other. You may think of hours spend sporting vs. overweight. If two variables are perfectly in an opposite match, the number may drop to -1.

  • Actually there isn't much relation between the two, except for the direction of the slope

    Let's do a few examples:
    If you correlate Hours of couch-surfing with Weight you may find a regression line that slopes up from left to right. The correlation coëfficient can still be anywhere between 0 and 1, meaning couch surfing is less or more related to weight. This would be called positive correlation.

    If you do the same with Hours working out, you may find a line that slopes down. Again correlation coefficients can go anywhere, but it is called negative correlation (=the higher the one, the lower the other).

    There are rules for what correlation coefficients may be considered significant, depending on sample size and desired degree of significance.

    Warning: NEVER draw conclusions about cause and effect!
    In some town they had yearly records about the number of births and the number of stork nests kept for over 60 years.
    Guess what?
    0.9 correlation, which is extremely significant by any measure!

  • Coefficient of determination is the variance explained by the explanatory variables to the dependent variable. Suppose the we assess the linear relationship between satisfaction and loyalty where loyalty is the dependent variable.

    Let's say that the computed r was 0.90. Then the coefficient of determination is simply r^2 = 0.90^2 = 0.81

    This suggests that 81% in the variability on the loyalty ratings is explained by the variability on the satisfaction ratings.. meaning the other 19% can be explained by other variables..