This is the user sandbox of Stephen1729. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

The Method of Least Squares

Problem Statement

Given $n$ 2-dimensional data points $(x_{1},y_{1}),(x_{2},y_{2}),\ldots ,(x_{n},y_{n})$ we will look at how to use the the method of least squares to determine the coefficients $a_{0},a_{1},a_{2},\ldots ,a_{k}$ of the polynomial $P_{k}$ which fits our data the best.

$P_{k}(x)=a_{k}x^{k}+a_{k-1}x^{k-1}+\cdots +a_{1}x+a_{0}$

But what does it mean for $P_{k}(x)$ to be a best fit? The following applet from the popular website KhanAcademy gives a nice illustration of what is means for a line to be a "best fitting line":

Fitting A Line To Data

History

The only known portrait of Adrien-Marie Legendre

It is difficult to determine the exact origins of The Method of Least Squares due to it's simplicity, practicality and numerous applications. A usual use for least squares was to fit a function to model the path or orbit of a celestial body. By observing the night sky, measurements can be taken and used to predict a path or orbit. One of the earliest and most thorough treatments of the method of least squares comes from Adrien-Marie Legendre in his work "Nouvelles méthodes pour la détermination des orbites des comètes". ^[1]

Of course, such an important mathematical method is not so easily attributed to one person, and there is much dispute over the original discoverer of the method. In the paper "Gauss and the Invention of Least Squares" Stephen M. Stigler writes "Adrien Marie Legendre published the method in 1805, and American, Robert Adrain, published the method in late 1808 or early 1809, and Carl Friedrich Gauss published the method in 1809". ^[2]

We may never know who discovered the method first. It is well known however, that Carl Friedrich Gauss extended the idea of least squares with an error term that is distributed as a Gaussian distribution. This is the familiar least squares that is used so often today.

Geometry of Least Squares

Two Dimensional

Consider a set of $n$ data points $(x_{i},y_{i})\in \mathbb {R} ^{2}$ for $i=1,2,\ldots ,n$ .

Let us consider the problem of fitting a straight line to this data. Assume that there exists some straight line given by:

$y=a_{1}x_{i}+a_{0}$

We call the distance between this hypothetical line and a given data point $y_{i}$ the error or residual. We denote the $i^{th}$ error as

$e_{i}=y-a_{1}x_{i}+a_{0}$

Then in order to determine the "line of best fit" we seek to minimize the total or sum of all errors.

$E=\sum _{i=1}^{n}(y-a_{1}x_{i}+a_{0})^{2}$

But why exactly do we want to do this? Consider the following interactive diagram made by Bill Finzer using the program Geometers Sketchpad.

Multi-Dimensional

Consider a set of $n$ data points $(x_{i,1},x_{2,i},\ldots ,x_{p,i})\in \mathbb {R} ^{p}$ for $i=1,2,\ldots ,n$ .

Now since our data is $p-$ dimensional, instead of fitting a $1-$ dimensional line as we did when we had $2-$ dimensional data, we will fit a $(p-1)-$ dimensional surface to this data. In the picture on the left, this surface is represented by a green plane.

We can see two fitted vectors in this span.

We can also see the observed value of $y$ .

The deviation from the fitted value, or the residual is shown as a dotted line.

Linear Least Squares

For linear least squares the problem statement reduces to finding a line of best fit for $n$ 2-dimensional data points. The line is given by

$P_{1}(x)=a_{1}x+a_{0}$

We have $n$ error's $e_{i}=y-a_{1}x_{i}+a_{0}$

We let $E$ be the total sum of squared errors.

$E=\sum _{i=1}^{n}(y_{i}-a_{1}x_{i}+a_{0})^{2}$

Then in order to determine the "line of best fit" we seek to minimize the total or sum of all errors. We note that $E$ is a function of the two parameters $a_{1},a_{0}$ , thus in order to minimize this function we have to take partial derivatives with respect to each of the variables, and solve for their values when the derivative is equal to 0. Thus we get two equations:

${\begin{aligned}{\frac {\partial E}{\partial a_{1}}}=\sum _{i=1}^{n}(y-a_{1}x_{i}+a_{0})^{2}\\[6pt]{\frac {\partial E}{\partial a_{0}}}=\sum _{i=1}^{n}(y-a_{1}x_{i}+a_{0})^{2}\end{aligned}}$

so solving these equations simultaneously for the coefficients gives:

${\hat {a}}_{1}={\frac {\sum _{i=1}^{n}{x_{i}y_{i}}-{\frac {1}{n}}\sum _{i=1}^{n}{x_{i}}\sum _{i=1}^{n}{y_{i}}}{\sum _{i=1}^{n}{x_{i}^{2}}-{\frac {1}{n}}(\sum _{i=1}^{n}{x_{i}})^{2}}}$

${\hat {a}}_{0}={\bar {y}}-{\hat {a}}_{1}{\bar {x}}$

Polynomial Least Squares

The concept of fitting a $k^{th}$ order polynomial is exactly the same as in the linear case. The course text ^[5] presents all of the long equations, but here we will just give an overview.

For the general polynomial $P_{k}(x)$ given by:

$P_{k}(x)=a_{k}x^{k}+a_{k-1}x^{k-1}+\cdots +a_{1}x+a_{0}$

We may estimate the coefficients in a similar way as for the linear case. We want to minimize the least squares error:

$E=\sum _{i=1}^{n}(y_{i}-P_{k}(x_{i}))^{2}$

$E=\sum _{i=1}^{n}(y_{i}-(a_{k}x^{k}+a_{k-1}x^{k-1}+\cdots +a_{1}x+a_{0}))^{2}$

We do this by taking the partial derivative of $E$ with respect to each of the parameters $a_{0},a_{1},\ldots ,a_{k}$ . Thus we will obtain a system of $k+1$ Normal Equations. We set this system equal to 0 and then solve for each of the parameters. The solutions are those that minimize the sum of squared errors, and we denote them ${\hat {a_{0}}},{\hat {a_{1}}},\ldots ,{\hat {a_{k}}}$ .

Applications

Least Squares is used extensively in social and physical sciences. Physicists, Engineers, Psychologists, and Managers, utilize least squares to fit functions to data to obtain summaries and estimates of the data, and to make predictions. Consider a problem related to social science taken from the book Applied Multivariate Statistical Analysis. ^[6]

A social scientist has collected data on $n=50$ Salespeople on two variables $x=$ "Sales Growth" and $y=$ "Mathematics Test". We perform the method of least squares to estimate the parameters of the line which fits the data best. When we plot this line through the data we see that it is increasing. However, it would be incorrect to assume that learning more mathematics will improve selling ability. This graph only shows that there is a correlation between mathematics and sales, it does not express the cause of the relationship.

Conclusion

The Method of Least Squares is one of my favorite bits of maths. It is super useful, and so simple that even a child can understand it. The Method of Least Squares was invented in order to approximate the "best" result given a series of results. So, if we are making some kind of measurements with an imprecise instrument, instead of buying a better instrument, we can take several measurements, and use the method of least squares to estimate the best measurement, at absolutely no additional cost! Nowadays, everywhere we find data and information, the method of least squares is also usually present.

References

^ Legendre, Adrien-Marie. "Nouvelles méthodes pour la détermination des orbites des comètes". Paris: F. Didot (in French). Paris: F. Didot: 80.
^ Stigler, Stephen M. (1981). "Gauss and the Invention of Least Squares". The Annals of Statistics. 9 (3): 465–474.
^ "Least Squares". The Geometers Sketchpag. Bill Finzer.
^ "Ordinary least squares". Wikipedia. Retrieved 12 December 2014.
^ Faires, Richard L. Burden, J. Douglas (2011). Numerical analysis (9th ed. ed.). Boston, MA: Brooks/Cole, Cengage Learning. ISBN 0538733519. {{cite book}}: |edition= has extra text (help)CS1 maint: multiple names: authors list (link)
^ Wichern, Richard A. Johnson, Dean W. (2007). Applied multivariate statistical analysis (6th ed. ed.). Upper Saddle River, N.J.: Prentice Hall. ISBN 0131877151. {{cite book}}: |edition= has extra text (help)CS1 maint: multiple names: authors list (link)

[1] Legendre, Adrien-Marie. "Nouvelles méthodes pour la détermination des orbites des comètes". Paris: F. Didot (in French). Paris: F. Didot: 80.

[2] Stigler, Stephen M. (1981). "Gauss and the Invention of Least Squares". The Annals of Statistics. 9 (3): 465–474.

[3] "Least Squares". The Geometers Sketchpag. Bill Finzer.

[4] "Ordinary least squares". Wikipedia. Retrieved 12 December 2014.

[5] Faires, Richard L. Burden, J. Douglas (2011). Numerical analysis (9th ed. ed.). Boston, MA: Brooks/Cole, Cengage Learning. ISBN 0538733519. {{cite book}}: |edition= has extra text (help)CS1 maint: multiple names: authors list (link)

[6] Wichern, Richard A. Johnson, Dean W. (2007). Applied multivariate statistical analysis (6th ed. ed.). Upper Saddle River, N.J.: Prentice Hall. ISBN 0131877151. {{cite book}}: |edition= has extra text (help)CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]