Least Squares Regression Line (LSRL)
Add yours
Key Questions

Least squares means your minimizing
#sum# (#Y_"i"# #Y_"hat"# #)^2#
Note:#Y_"hat"# is often written as Y with a ^ written above itYou can think of regression as fitting a straight line to a bunch of points on a graph (this is a simple case, and it can be more complicated, but why make it complicated). The question is then how are we going to fit the line through the points, or put another way, of all the different lines that we could choose that go through the points which one do we choose?
Although most people only ever learn about least squares regression, there are different types of regression that use different methods for choosing the line. For example, least Median Regression is another type.
In Least Squares regression we choose the line that goes through the points where the total (
#sum# ) of the distance from the line to each point squared (#Y_"i"# #Y_"hat"# #)^2# is the smallest.Lets brake this down further. First what is/ why are we using
#Y_"i"# #Y_"hat"# ?#Y_"i"# #Y_"hat"# is just the distance vertically from a point (the ith point) to the line. This is also called the residual. Hopefully it will make sense that we are trying to find the best line by choosing the line with the smallest distance between the line itself and the points. But at this point you may wonder why are we squaring this difference?Well squaring this difference does a number of useful things. First it makes all the differences positive before we find the minimum of their sum. This is a very good thing, distance is always positive. But wait you may ask, couldn't we use absolute value? Isn't that how we normally make distances positive? And you'd be right for asking that.
That is a valid approach, and is a different type of regression. But by using the squares of the differences we get an additional benefit. It means that big differences become huge in our calculation (since a big difference* a big difference = a really really big number) and that means we will prefer to choose lines that make really big distances smaller over lines that pay less attention to these big differences.
Finally we sum these squared differences because we need to consider the distance from the line to each dot.

Equation for leastsquares linear regression:
#y = m x + b# where
# m = (sum(x_iy_i)  (sum x_i sum y_i)/n)/(sum x_i^2 ((sum x_i)^2)/n)# and
#b = (sum y_i  m sum x_i)/n# for a collection of
#n# pairs#(x_i,y_i)# This looks horrible to evaluate (and it is, if you are doing it by hand); but using a computer (with, for example, a spreadsheet with columns :
#y, x, xy, and x^2# ) it isn't too bad. 
The primary use of linear regression is to fit a line to 2 sets of data and determine how much they are related.
Examples are:
2 sets of stock prices
rainfall and crop output
study hours and grades
With respect to correlation, the general consensus is:
Correlation values of 0.8 or higher denote a strong correlation
Correlation values of 0.5 or higher up to 0.8 denote a weak correlation
Correlation values less than 0.5 denote a very weak correlation\f 
Every time the xvariable changes by 1, the yvariable changes by an amount equal to the slope. For example, let's say the slope of the leastsquares regression line for air temperature and ice cream sales was 1000. That would mean that every time the air temperature increased by 1 degree, ice cream sales would increase by $1000.