What is the total sum of squares?

1 Answer
Dec 5, 2016

The total sum of squares (also written #SS_"Total"# or #SS_T#) is simply the sum of all the squared data values:

#SS_T=sum_iy_i^2#.

Explanation:

As written above, if we have #N# total observations in our data set, then #SS_T# has #N# degrees of freedom (d.f.), because all #N# variables used in the sum are free to vary.

Sometimes we account for the average of all the data values (that is, #bary#) by instead squaring the differences between each data point #y_i# and the overall average #bary#. In this case, we write:

#SS_T=sum_i(y_i-bary)^2#

This version of total sum of squares is the corrected total sum of squares, and if we need to clarify that this is different than the uncorrected version, we can denote this something like #SS_"Total, Corrected"#.

#SS_"Tot. Cor."# has one fewer degree of freedom than #SS_T#. This is because all but one of the #N# differences (that is, #y_i-bary,# for #i=1,...,N#) are free to vary.

e.g. Suppose I tell you I have three numbers whose average is 4. How many of those numbers do I have to tell you in order for you to know all of them? The answer is "one less than all 3"; because you know the average, you can use the first two numbers I give you to figure out the third number by using

#bary=(y_1+y_2+y_3)/3<=>y_3=3bary-(y_1+y_2)#.

If the first two numbers are 3 and 4, you know the last number is 5. If I give you 2 and 4, you know the last number is 6. In this sense, one of the three data points is not free to vary. And since we are using the (fixed) average #bary# in calculating #SS_"Tot. Cor.",# it only has #N-1# degrees of freedom.