# What is the residual sum of squares?

##### 1 Answer

#### Answer:

It's the remaining variance unaccounted for by explainable sources of variation in data.

#### Explanation:

All data sets have what's known as a "total sum of squares" (or perhaps a "corrected total sum of squares"), which is usually denoted something like **quantifies the total amount of variance** for any given data set.

Using some formulas,

regression(line slopes, like how a server's tips increase with the price of a meal), denoted#SS_R# ;main effects(category averages, like how women tip more than men, female servers get more tips than male servers, etc.), denoted#SS_A# ,#SS_B# , etc;interaction effectsbetween two explanatory variables (like how men tip more than womenif their server is female), denoted#SS_(AB)# ;lack of fit(repeated observations when all explanatory variables are the same, like if a customer dines at a restaurant twice with the same server), denoted#SS_"LOF"# ;- and many others.

Most of the time, these explainable sources *do not* account for all of the total variance in the data. We certainly hope they come close, but there is almost always a little bit of variance left over that has no explainable source.

This leftover bit is called the **residual sum of squares** or the **sum of squares due to error** and is usually denoted by

We usually write an equation like this:

It's that last term, the **It's the sum of all the squared distances between each observed data point and the point the model predicts at the corresponding explanatory values.** These distances are also called the **residuals**, hence the term "residual sum of squares". In this way,

Note:

Unfortunately, explaining degrees of freedom would make this answer a lot longer, so I have left it out for the sake of keeping this response (relatively) short.