Talk:Maximum subarray problem
This article has not yet been rated on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
Possibly Erroneous Initialization of s
[edit]In the O(n) algorithm, s (the largest calculated sum so far) is initialized as follows:
int s=1<<31;
First, this assumes that each integer is exactly 4 bytes wide, which is not the case for all architectures.
Second, it seems sufficient to initialize it to the first element of the array, A[0]
Debajit (talk) 23:29, 25 November 2007 (UTC)
- I've struck out my second proposition, which would fail for sequences having exactly one element. For my first proposition, I think s is best initialized to -Infinity as (for C++)
int s = numeric_limits<int>::min()
- or to MIN_INT in C
Notability
[edit]I have seen a few publications in IEEE and JSTOR refering to this algorithm. I have no understanding of the algorithm itself, therefore I cannot verify the accuracy of the information here, but I would conclude that this article is based on a notable topic. --MattWatt 22:07, 10 April 2007 (UTC)
Incorrect solution
[edit]Kadane's algorithm is not a solution for general maximum subarray problem: it only finds maximum subarray starting from the brginning of array, not any subarray.
The correct algorithm is:
Kadane_s_Algorithm(arr[1..n])
begin
(max, a, b) := (-INFINITY, 0, 0)
curr := 0
aa := 1
for bb := 1 to n do
curr := curr + arr[bb]
if curr > max then
(max, a, b) := (curr, aa, bb)
endif
if curr < 0 then
curr := 0
aa := bb + 1
endif
endfor
return (max, a, b)
end
217.132.204.221 (talk) 05:47, 15 June 2010 (UTC)
No, it finds any subarray of the whole array. That's what the line "max_ending_here = max(0, max_ending_here + x)" is for: the points where the max is 0 are the ones where a new subarray starts. —David Eppstein (talk) 06:28, 15 June 2010 (UTC)
I confirm that the shown solution is incorrect. Given the following array:
{5,-2,4,-6,1,3, -7 , 3, 10, -15, -4 ,3, -2, 16, -9, 10};
Applying the algorithm as shown in the page, I get a maximum subarray of 11, whereas the maximum subarray should be 13 (3 + 10) — Preceding unsigned comment added by 187.194.146.244 (talk) 17:31, 16 November 2013 (UTC)
(Scratch previous comment, the algorithm is correct)
The solution given in python is incorrect. Please remove the code.
it fails following case: [-2,1,-3,4,-1,2,1,-5,4]. Should produce 6, instead of 10.
Clarification of maximum sum when maximum sum is < 0.
[edit]If I pass in an array of entirely negative numbers, e.g. (-1,-1,-1,-1), why wouldn't the maximum sum be -1 instead of 0? —Preceding unsigned comment added by 76.247.17.199 (talk) 22:47, 31 July 2010 (UTC)
- Because the empty sum has a larger total than sums of one or more elements. —David Eppstein (talk) 23:44, 31 July 2010 (UTC)
- That's not necessarily true. If you want to include negative sums, first increase max_ending_here, then if max_ending_here > max_so_far, replace max_so_far, THEN if max_ending_here < 0, set it to 0. In that way the max will contain the closest negative member to 0 for an all negative array. (Alternatively, you could just find the max member if the resulting max == 0 and it will still be O(n)) -April 9, 2012 — Preceding unsigned comment added by 128.138.45.66 (talk) 04:21, 10 April 2012 (UTC)
Kadane's algorithm code
[edit]I am finding it difficult to believe that we should just scrap the C++ code in this section. I think that before we do that we should implement the ability to track indices in the Python version as well. The C++ code did this, but the Python code does not. Mrcsmcln (talk) 11:45, 2 October 2014 (UTC)
- Wikipedia is not a code repository. We should not have 20 different implementations of the same algorithm in 20 different programming languages, or even 2. Multiple implementations are ok only when they show significant variations in the algorithms, and tracking indices is not a significant variation. Pseudocode might be a better choice than any specific language, but Python is closer to pseudocode than C++ is. —David Eppstein (talk) 15:04, 2 October 2014 (UTC)
I added an external link to the corresponding Rosetta Code page (where it is called greatest subsequential sum algorithm), there are implementations in many other languages as well Andre.holzner (talk) 12:28, 27 August 2016 (UTC)
Erroneous (or confusing) divide and conquer version
[edit]The divide and conquer version of the maximum subarray is notoriously O(n lg n) (see CLRM for instance as a reference.) I have trouble understanding the solution, for instance, what sum[f - 1] and sum[f]? the only array defined clearly in the problem is a, I don't know what sum is supposed to contain. An additional explanation would be great. — Preceding unsigned comment added by Poitevinpm (talk • contribs) 21:25, 7 January 2018 (UTC)
- For my own part, I don't understand why other editors insist on devoting so much of the space in the article (and in textbooks like CLRS) to bad algorithms like the brute force and divide and conquer ones, when Kadane's algorithm is so simple (simpler than the divide and conquer) and in all respects better. —David Eppstein (talk) 21:47, 7 January 2018 (UTC)
- The pseudo code for Divide and Conquer has pretty bad notation and mainly wrong complexity, it is surely O(n * log(n)). The equation at the end is wrong, time complexity can be described as T(n) = 2T(n/2) + O(n), T(1) = O(1) (see the CLRS book - Introduction to Algorithms, the algorithm is described from p. 68 in 3rd ed.). The O(n) term is coming from searching for overlapping sums in each iteration. Apart from that, I agree that Kadane's algorithm is really simple and the most efficient, so I would rather have it mentioned first. The value of other two solutions is only for showing different approaches and comparing complexities. Protivinsky (talk) 19:18, 25 February 2018 (UTC)
- The pseudocode given in this article is clearly 2T(n/2)+O(1) (where do you think it takes more than constant time to combine solutions)? If CLRS gives an algorithm with different complexity then it is surely a different algorithm. —David Eppstein (talk) 19:46, 25 February 2018 (UTC)
- You are right indeed, I got confused by the pseudocode and thought it is the same implementation as in CLRS (where the computation of center problem has linear cost). I didn't realize that the code uses array of cumulative sums without bothering to define it or explain it. I put together python code for the divide and conquer algorithm instead of that confusing pseudocode and I will replace it. Protivinsky (talk) 23:23, 27 February 2018 (UTC)
- The pseudocode given in this article is clearly 2T(n/2)+O(1) (where do you think it takes more than constant time to combine solutions)? If CLRS gives an algorithm with different complexity then it is surely a different algorithm. —David Eppstein (talk) 19:46, 25 February 2018 (UTC)
- The pseudo code for Divide and Conquer has pretty bad notation and mainly wrong complexity, it is surely O(n * log(n)). The equation at the end is wrong, time complexity can be described as T(n) = 2T(n/2) + O(n), T(1) = O(1) (see the CLRS book - Introduction to Algorithms, the algorithm is described from p. 68 in 3rd ed.). The O(n) term is coming from searching for overlapping sums in each iteration. Apart from that, I agree that Kadane's algorithm is really simple and the most efficient, so I would rather have it mentioned first. The value of other two solutions is only for showing different approaches and comparing complexities. Protivinsky (talk) 19:18, 25 February 2018 (UTC)
Discussion: Pharlan35244
[edit]@Pharlan35244:: I moved this discussion to the talk page, the proper page for things like this. Hdjensofjfnen (If you want to trout me, go ahead!) 03:47, 20 September 2018 (UTC)
Pharlan35244 said:
"(actually theta(n) is a better answer because both the lower bound and the upper bound has the same runtime. By saying the runtime is , you do not clarify that Kadane's algorithm has the same lower and upper bound, thus it is not completely correct for the run time to be )"
the solution for keeping track of the indices for the subarray is wrong
[edit]here is a working script
max_ending_here = A[0]
startOld = start = end = max_so_far = 0
for i, x in (zip (range (1, len (A)), A[1:])):
if max_ending_here + x > x:
max_ending_here = max_ending_here + x
else:
max_ending_here = x;
start = i
if max_so_far < max_ending_here:
max_so_far = max_ending_here;
end = i
print (max_so_far , start, end)
Issues with the "History" section, regarding brute-force
[edit]It seems incorrect that "the brute force running time" of the one-dimensional problem is "O(n3)". This is not right.
Rather, the brute-force runtime is O(n2). What, then, I wonder, did Ulf Grenander's improvement do? This is confusing.
I suspect that there are similar issues with the description of the two-dimensional problem's history.
I'm not clear on the history, or else I would edit this myself...!
- The brute force algorithm tries different subarrays and takes time to compute the sum for each of them, giving total. Maybe you are already thinking of optimizing it to be faster? But then it's not the most obvious and basic version of the algorithm any more. And optimizing a bad algorithm is wasted effort. —David Eppstein (talk) 04:06, 7 August 2019 (UTC)
History is weird
[edit]Soon after, Shamos described the one-dimensional problem and its history at a Carnegie Mellon University seminar attended by Jay Kadane, who designed within a minute an O(n)-time algorithm, which is clearly as fast as possible.
The previous comment wonders out loud about the history section, and I have to wonder, too.
Knuth began the project, originally conceived as a single book with twelve chapters, in 1962.
The first three volumes of what was then expected to be a seven-volume set were published in 1968, 1969, and 1973.
Work began in earnest on Volume 4 in 1973, but was suspended in 1977 for work on typesetting.
Supposedly, this anecdote takes place in 1977. By 1977, if you couldn't derive the O(1) algorithm for the one-dimensional case within 30 minutes, you were seriously in the wrong professional already. The only viable excuse would be a three martini lunch.
What really was going on here? Because I doubt any of these men were quite that dense upstairs.
For example, I suspect the guy agonizing over the divide and conquer thought he was solving a somewhat generalized version of the problem statement here, for which some of that heavy machinery would have actually been appropriate.
As written, this does not pass my historical smell test. — MaxEnt 02:09, 12 May 2020 (UTC)
Empty subarrays admitted vs. forbidden
[edit]In order to reduce the number of good-faith edits like the most recent one, what about presenting the "No empty subarrays admitted" variant first, and in full detail, and then just the changes needed to turn it into the "Empty subarrays admitted" variant? If we are lucky, most over-hasty ad-hoc page editors are familar with the "No empty subarrays admitted" variant, and won't change the text, since it meets their expectations. However, if we are unlucky, the favorite variants are distributed 50:50, and we'd experience lots of thoughtless edits in the opposite direction. I think, implementing my suggestion is worth a try. - Jochen Burghardt (talk) 14:55, 10 September 2023 (UTC)
- I'm skeptical that this will help but it seems harmless enough and worth a try. —David Eppstein (talk) 18:45, 10 September 2023 (UTC)
- Done. Some complications were caused by the fact that both Bentley and Gries refer to the "empty" variant, while we start with the "nonempty" variant. The worst of them is the new [note6] which now has to explain a modification the Bentley should have made in the previous (rather than the current) variant. - I gave the essential loop invariants for both algorithm variants; they could (after some days of trial and error) be verified by Frama-C, so they shouldn't be flawed. - Jochen Burghardt (talk) 15:28, 13 September 2023 (UTC)