How is the Ordinary Least Squares formula derived?

1 Answer
Mar 19, 2018

Please see below.

Explanation:

Let us say that we expect a linear relation, say #y=ax+b#, between two observable variables #x# and #y# and where #a# and #b# are not known. It is not necessary that the relation should be linear, but here we have assumed this only for the purpose of simplification.

However, actual observed data may be accompanied by errors or noise for various reasons and when we plot the observed data on a graph, it may not be a linear and may be some thing as shown below.
enter image source here
Observe that each point denotes observed value of #x# and #y# and #n# points provides us with #n# data points given by #(x_i,y_i)#, where #i# ranges from #1# to #n#.

We can draw many lines through these points, with varying slopes i.e. #a's# and intercepts #b's#, but how do we know which one is the best fit. This is done by the method of Least squares. In this method we define the relationship between observed data and expected data - by minimizing the sum of squares of the deviation between observed and expected values.

In other words, best line has minimum error between line and data points. Note that had we not squared, positive and negative errors would have almost cancelled out. We will talk more about this later.@

For a particular #x_i# observed value is #y_i#, expected value of #y_i# for is #ax_i+b# and difference between them #d_i# is given by #y_i-ax_i-b# and in least square method we seek to
minimise error #E=sum_(i=1)^nd_i^2# i.e. #sum_(i=1)^n(y_i-ax_i-b)^2#.

To get this minimum, we use calculus. For this, we must have first derivatives of #a# and #b# yielding zero. Differentiating #sum_(i=1)^n(y_i-ax_i-b)^2# w.r.t. #a# and #b#, we get

#(delE)/(dela)=-2sum_(i=1)^nx_i(y_i-ax_i-b)=0#

and #(delE)/(delb)=-2sum_(i=1)^n(y_i-ax_i-b)=0#

To solve for #a# and #b#, we rewrite them as

#asumx_i^2+bsumx_i=sumx_iy_i# and

#asumx_i+bn=sumy_i#

and solving them for #a# and #b# we get

#a=(sumy_isumx_i^2-sumx_isumx_iy_i)/(nsumx_i^2-(sumx_i)^2)#

and #b=(nsumx_iy_i-sumx_isumy_i)/(nsumx_i^2-(sumx_i)^2)#

@ - Note that #asumx_i+bn=sumy_i# can be expressed as #asumx_i/n+b=sumy_i/n# is just a fit between averages of #x_i# and #y_i#. Hence, this alone may not give the best fit. The latter is arrived due to #asumx_i^2+bsumx_i=sumx_iy_i#.