We can approach this problem in two ways, I will do so using MLE
We know the following
#E(epsilon) = 0# thus we can remove it becuase we expect this to be 0 regardless of the variance.
#f(y|xbeta) = 1/(sqrt(2pisigma^2))e^(-(y-xbeta)^2/(2sigma^2))# if we assume a normal distribution
now using maximum likelihood we find the best estimate for #beta# if we know what #y# and #x# are. In most cases we know what these are so
#l(beta|x,y) = 1/(sqrt(2pisigma^2))e^(-(y-xbeta)^2/(2sigma^2))#
now we take the log to simplify thus
#log(l(beta|x,y)) = log(1/(sqrt(2pisigma^2))) + (-(y-xbeta)^2/(2sigma^2))#
then we take the derivative set to 0 and solve
#= -(x(y-xbeta))/(sigma^2)=0#
#= (x^2beta)/(sigma^2)= (xy)/(sigma^2)#
#= x^2beta=xy#
#= beta=(xy)/x^2#
This agrees algebraically if we are interested in minimizing the residual difference, usually using sum of squared loss function eg
#f_beta=(y-xbeta)^2#
#fprime_beta=-2x(y-xbeta)=0#
#=2x^2beta=2xy#
#=beta=(xy)/x^2#