Jump to content

Wikipedia:Reference desk/Archives/Mathematics/2011 January 11

From Wikipedia, the free encyclopedia
Mathematics desk
< January 10 << Dec | January | Feb >> January 12 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


January 11

[edit]

Squared differences

[edit]

When fitting a line to a sample of data in linear regression, why are squared differences minimized instead of the absolute value of the differences? --41.213.125.249 (talk) 12:37, 11 January 2011 (UTC)[reply]

I've sometimes asked myself the same question, and I don't have a complete answer. Mathematically, what you're suggesting is to measure distances between certain vectors using the L1 norm instead of the L2 norm. From the point of view of interpreting the data in the real world, it's not clear to me why one would be better than the other. For example, it's not clear why it's better to be off by 2 in two cases than off by 3 in one and off by 1 in the other. It depends whether you want to penalize big discrepancies.
But mathematically, the L2 norm leads to a simpler theory, since there is a geometric interpretation in terms of the Euclidean scalar product. For example, unless I'm mistaken, with the L1 norm, you could never get simple formulas for the best fit the way you do using the squares of the distances. On the other hand, with computers, I imagine it wouldn't be too much of a problem to find the best L1 fit if you wanted to, or at least an approximation of it. 82.120.58.206 (talk) 13:18, 11 January 2011 (UTC)[reply]
See Ordinary least squares#Geometric approach. It's a bit sketchy, but it will give you an idea. 82.120.58.206 (talk) 13:31, 11 January 2011 (UTC)[reply]
Personally I'm satisfied with the pragmatic point that using least squares leads to a more tractable problem without obviously giving less reasonable results. Minimizing absolute differences leads to strange effects where the best fit is not well defined (for example, fitting (0,0), (1,1), (2,1), (3,0) gives the same sum of absolute differences for every proposed fit that passes between the samples), and lack of smoothness: when there are many samples, a small change to one of the samples will most likely have no effect at all on the least-absolute-difference fit.
As a more pedantic point, without least squares, it wouldn't be a "linear" regression -- that term refers not to the fact that a line is produced, but that it is found by solving a linear set of equations. It may equally well be applied to find the least-squares best parabolic fit, and would then still be a linear regression. –Henning Makholm (talk) 17:27, 11 January 2011 (UTC)[reply]
Hmm, our Linear regression article disagrees fundamentally with my pedantry, so caveat lector! –Henning Makholm (talk) 02:10, 12 January 2011 (UTC)[reply]

The mean value between 7 and 9 minimizes f(x)=(x-7)^2+(x-9)^2. The minimum is f(8)=2. But any number 7≤x≤9 minimizes g(x)=|x-7|+|x-9|. Bo Jacoby (talk) 01:03, 12 January 2011 (UTC).[reply]

Good point. In some sense, the choice between least-squares and least-absolute-differences appears to be a higher-dimensional equivalent of the choice between arithmetic mean and median. (Though it's not clear to me that the efficient algorithms for finding a median have efficient counterparts for fitting a linear relationship). –Henning Makholm (talk) 02:10, 12 January 2011 (UTC)[reply]
The non-uniqueness point you both make is a good one, which I hadn't thought of. I haven't checked the details, but I would guess that this is directly related to the fact that space with the L1-norm is not strictly convex.82.120.58.206 (talk) 06:27, 12 January 2011 (UTC)[reply]
According to ru:Строго нормированное пространство#Примеры строго нормированных пространств, Lp spaces are strictly convex for 1 < p < +∞ as a consequence of Young's inequality, so I think the non-uniqueness problem wouldn't arise for such p. We're just unlucky with p = 1. 82.120.58.206 (talk) 07:00, 12 January 2011 (UTC)[reply]
Because using the square gives less weight to small deviations from the regression line and more weight to big ones. I recall that it was also easier to work out in pre-computer days. 92.15.24.16 (talk) 13:33, 15 January 2011 (UTC)[reply]

The Gauss-Markov theorem justifies least-squares regression when you can assume that the measurement errors are independent and have equal variances. 84.239.160.59 (talk) 11:38, 16 January 2011 (UTC)[reply]

Series and Intergration

[edit]

Can anyone explain why is equivalent to ? Visit me at Ftbhrygvn (Talk|Contribs|Log|Userboxes) 13:00, 11 January 2011 (UTC)[reply]

What you have written is a Riemann sum for the integral. Look at the part of the article about right sums, and try and figure out what f, Q, a, b and n need to be in the formula.82.120.58.206 (talk) 13:24, 11 January 2011 (UTC)[reply]
Thanks for your quick answer! Visit me at Ftbhrygvn (Talk|Contribs|Log|Userboxes) 14:47, 11 January 2011 (UTC)[reply]

How to visualize integer sequences?

[edit]

Hello,

is there a good way to visualize integer sequences up to a given upper bound? I know, if for example I wanted to visualize the sequence of prime numbers up to, lets say 100, I could simply draw an x-axis and mark the position of each prime. Is there another way of visualizing sequences of integers? Toshio Yamaguchi (talk) 13:27, 11 January 2011 (UTC)[reply]

Is your sequence increasing? 82.120.58.206 (talk) 13:32, 11 January 2011 (UTC)[reply]
Yes, most of the sequences I have in mind are subsequences of the prime numbers. Toshio Yamaguchi (talk) 13:41, 11 January 2011 (UTC)[reply]
At the moment, I can't think of a better way than the one you've suggested.
What information do you want to make apparent in the figure? For example, do you want to show how often prime numbers appear in various intervals (say, how many primes there are between 100,000 and 1,000,000)? If so, you might want to use a graph like the one at Prime number theorem, or even something similar, but replacing x and y with log x and log y.82.120.58.206 (talk) 14:56, 11 January 2011 (UTC)[reply]
Here are more examples along the same lines: [1]. 82.120.58.206 (talk) 15:03, 11 January 2011 (UTC)[reply]
First, thanks for the quick replies. The purpose of my question is the following: I have designed an infobox template for articles about integer sequences (see Template:Infobox integer sequence) and wanted to add an image in the infoboxes. For example I have placed one such infobox in the article about Wagstaff primes and now look for a way to create an image that helps to visualize this sequence somehow. Toshio Yamaguchi (talk) 18:53, 11 January 2011 (UTC)[reply]
OEIS does it this way FWIW.—msh210 19:16, 11 January 2011 (UTC)[reply]
Thanks. I will see what I can do with that. I hope I will manage to create some similar images to place in the infoboxes (or hopefully even better), I think it could be a nice eyecandy if it is done well.Toshio Yamaguchi (talk) 19:41, 11 January 2011 (UTC)[reply]

Adding paraboloids

[edit]

Am I right in saying that paraboloids are precisely those surfaces with constant and , even when not axis-aligned or with the vertex at the origin, and that they may be categorized into flat (both 0), cylindrical (one 0), elliptical (same sign), and hyperbolic (opposite signs)? If so, it follows immediately that paraboloids, elliptic paraboloids, and (trivially) flat paraboloids () are separately closed under addition, right? --Tardis (talk) 15:55, 11 January 2011 (UTC)[reply]

How about (1+x²+y²) + (1-x²-y²)? –Henning Makholm (talk) 17:34, 11 January 2011 (UTC)[reply]
Sorry -- I should have said each category of elliptic paraboloid (both + and both -) separately. Of course the additive inverse of an elliptic is elliptic and the sum is flat (and you can make cylindrical/hyperbolic ones too). --Tardis (talk) 18:18, 11 January 2011 (UTC)[reply]
Your classification doesn't work for x²+y²-3xy. Your method would classify it as elliptical, but it is really hyperbolic (just consider the line x=y). –Henning Makholm (talk) 19:17, 11 January 2011 (UTC)[reply]
(Not to speak of simply z=xy, which your method would consider to be flat...) –Henning Makholm (talk) 19:21, 11 January 2011 (UTC)[reply]
Yeah, I realized that at about the same time. I still think it's true that all and only paraboloids have constant second derivatives, but what I really want for classification is the signs of the eigenvalues of the Hessian matrix. Then we know that the sum of positive definite matrices is positive definite, which (if the previous sentence holds) proves the intuitively obvious fact that up-opening paraboloids cannot sum to hyperbolic or down-opening paraboloids. So the remaining questions:
  1. Do constant and imply constant and thus a constant Hessian?
  2. Does any other surface have a constant Hessian?
--Tardis (talk) 19:46, 11 January 2011 (UTC)[reply]
1. Yes, because and mutatis mutandis in the x direction,
2. No, because any such surface must coincide with the quadric that has the same value and gradient at the origin.
Henning Makholm (talk) 20:51, 11 January 2011 (UTC)[reply]
Great! Thanks for walking me through my confused bits. So the overall proof is this:
  1. Constant Hessians correspond precisely to paraboloids (up to a choice of vertex location: three parameters, like your "value and gradient at the origin"), and positive- or negative-definite Hessians to elliptic ones (up or down respectively).
  2. The Hessian of any sum of paraboloids is a sum of constants and thus constant, so paraboloids are closed under addition.
  3. The sum of any two positive- or negative-definite matrices is another, so the elliptic-up and elliptic-down paraboloids are each separately closed under addition.
  4. (Of course, the Hessian of any flat paraboloid is the zero matrix, and that's closed under addition too, but we already knew that.)
Thanks again; in the end, I think the Hessians are all I need anyway, but it's great to have a complete understanding. --Tardis (talk) 00:37, 12 January 2011 (UTC)[reply]
On further thought, my argument above using third-order derivatives requires some smoothness assumptions that can be dispensed with. For a stronger statement, assume nothing except that and everywhere. Then:
  1. The four values z(±1,±1) determine z everywhere.
    • Fix . The function must be a quadratic polynomial with leading coefficient a, and its known values for x=±1 determine the other two coefficients uniquely. Thus, z(x,±1) is determined for all x
    • Fix . Like before, the function must be a quadratic polynomial with leading coefficient b, and its known values for y=±1 determine the other two coefficients uniquely. Thus, z(x,y) is determined for all x and y.
  2. For any given values at these four points, there are coefficients c,d,e,f such that the polynomial p(x,y) = ax²+by²+cxy+dx+ey+f assumes those values.
    • This is just a matter of solving a linear equation system in the unknown coefficients.
  3. Since p(x,y) satisfies the assumption about second-order derivatives, this proves that z(x,y)=p(x,y).
Henning Makholm (talk) 01:40, 12 January 2011 (UTC)[reply]