What is the residual sum of squares?
It's the remaining variance unaccounted for by explainable sources of variation in data.
All data sets have what's known as a "total sum of squares" (or perhaps a "corrected total sum of squares"), which is usually denoted something like
Using some formulas,
- regression (line slopes, like how a server's tips increase with the price of a meal), denoted
- main effects (category averages, like how women tip more than men, female servers get more tips than male servers, etc.), denoted
#SS_A#, #SS_B#, etc;
- interaction effects between two explanatory variables (like how men tip more than women if their server is female), denoted
- lack of fit (repeated observations when all explanatory variables are the same, like if a customer dines at a restaurant twice with the same server), denoted
- and many others.
Most of the time, these explainable sources do not account for all of the total variance in the data. We certainly hope they come close, but there is almost always a little bit of variance left over that has no explainable source.
This leftover bit is called the residual sum of squares or the sum of squares due to error and is usually denoted by
We usually write an equation like this:
It's that last term, the
Unfortunately, explaining degrees of freedom would make this answer a lot longer, so I have left it out for the sake of keeping this response (relatively) short.