# Least Squares Regression Line (LSRL)

## Key Questions

• Equation for least-squares linear regression:

$y = m x + b$

where
$m = \frac{\sum \left({x}_{i} {y}_{i}\right) - \frac{\sum {x}_{i} \sum {y}_{i}}{n}}{\sum {x}_{i}^{2} - \frac{{\left(\sum {x}_{i}\right)}^{2}}{n}}$

and
$b = \frac{\sum {y}_{i} - m \sum {x}_{i}}{n}$

for a collection of $n$ pairs $\left({x}_{i} , {y}_{i}\right)$

This looks horrible to evaluate (and it is, if you are doing it by hand); but using a computer (with, for example, a spreadsheet with columns :$y , x , x y , \mathmr{and} {x}^{2}$) it isn't too bad.

• The primary use of linear regression is to fit a line to 2 sets of data and determine how much they are related.

Examples are:

2 sets of stock prices

rainfall and crop output

With respect to correlation, the general consensus is:

Correlation values of 0.8 or higher denote a strong correlation
Correlation values of 0.5 or higher up to 0.8 denote a weak correlation
Correlation values less than 0.5 denote a very weak correlation\f

All this means is the minimum between the sum of the difference between the actual y value and the predicted y value.

$\min {\sum}_{i = 1}^{n} {\left({y}_{i} - \hat{y}\right)}^{2}$

#### Explanation:

Just means the minimum between the sum of all the resuidals

$\min {\sum}_{i = 1}^{n} {\hat{u}}_{i}^{2}$

all this means is the minimum between the sum of the difference between the actual y value and the predicted y value.

$\min {\sum}_{i = 1}^{n} {\left({y}_{i} - \hat{y}\right)}^{2}$

This way by minimizing the error between the predicted and error you get the best fit for the regression line.