Jump to content

Talk:Euclidean minimum spanning tree/GA1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

GA Review

[edit]

Article (edit | visual edit | history) · Article talk (edit | history) · Watch

Reviewer: Ovinus (talk · contribs) 06:50, 24 August 2022 (UTC)[reply]

I'll grab this one. Looks excellent on first glance. A quick note: I'm moving soon and may not respond in a timely manner; sorry in advance. Ovinus (talk) 06:50, 24 August 2022 (UTC)[reply]

I'm traveling the next few days and so may also not respond in a timely manner. —David Eppstein (talk) 07:30, 24 August 2022 (UTC)[reply]
  • Publications on this problem commonly abbreviate it as "EMST" What problem, though? Ovinus (talk) 07:04, 24 August 2022 (UTC)[reply]
  • must again start and end at the given points Probably "still must", since this is in contrast to the Steiner tree
  • No mention of the triangle inequality here. What about the generalized problem of finding the MST of a weighted complete graph satisfying the triangle inequality? The metric TSP is materially different in difficulty (at least to approximate) from the non-metric TSP, right? Or is this too broad a generalization to warrant a mention Ovinus (talk) 07:04, 24 August 2022 (UTC)[reply]
    • Triangle inequality doesn't make complete-graph MST any easier than the general problem, because you can obtain the triangle inequality without changing the tree by adding your favorite big number to all of the weights. Anyway, that's a topic for Minimum spanning tree § MST on complete graphs rather than this article. I think other geometries are interesting (there's less published about hyperbolic-geometry MST than there should be) but again that's somewhat off-topic. —David Eppstein (talk) 07:27, 24 August 2022 (UTC)[reply]
      • Aha, that's clever. Well, is there enough to warrant a sentence about finding MSTs in other metrics? You're already mentioning related problems/variations; this isn't that far out and probably can't be covered in a standalone article. Ovinus (talk) 07:34, 24 August 2022 (UTC)[reply]
        • The only thing I think that's relevant enough and sourceable is that the approach of computing a Delaunay triangulation to get time also works in the hyperbolic plane (actually, using the Euclidean Delaunay triangulation of a Poincaré model of the hyperbolic plane). Which is sort of interesting, but kind of folklore. The earliest reference I can find is some talk slides from a survey talk I gave in 2003 [1], I'm uncomfortable using that as a reference for multiple reasons (it's not really a publication, it's by me, and I'm not certain I deserve credit for discovering it) but I'm even less comfortable with later sources (the clearest I can find is https://arxiv.org/abs/2112.02553 but it's also not a reliably published source and definitely doesn't deserve credit for this observation). —David Eppstein (talk) 03:06, 27 August 2022 (UTC)[reply]
  • I'd briefly mention, in Angles and vertex degrees, that the kissing number in dimension n is unknown for most n. An uninformed reader might think that this characteristic of EMSTs will be known for all dimensions. Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "their precise values are not known" Have there been any Monte Carlo–type estimates? Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
    • The original reference has some empirical data, but without any error or convergence rate analysis that could be used to justify calling it an estimate. They are pretty cautious in how they interpret it. —David Eppstein (talk) 03:13, 27 August 2022 (UTC)[reply]
  • "This contradiction shows that w cannot exist. It follows that any other region of the plane within this empty lens must also be empty" This seems fairly evident from what was just explained, because you say "cannot" and eventually "contradicted". Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "defined from a system of points using empty regions" wdym by "using empty regions" Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
    • I mean "using empty regions in ways that are detailed individually in each of the bullet points below". —David Eppstein (talk) 03:17, 27 August 2022 (UTC)[reply]
      • Yes but it is confusing to the uninitiated. I'd like something like "using empty regions in their construction" Ovinus (talk) 07:20, 27 August 2022 (UTC)[reply]
        • My feeling is that "certain X defined from Y using Z must contain all A of all B" is already grammatically overcomplicated. Expanding it to "certain X defined from Y using Z in their definition must contain all A of all B" is both more complicated and redundant. On the other hand, I don't want the introductory sentence to go into detail about what it means for a geometric graph to be defined by empty regions because the definitions are different. For some (RNG or Gabriel) there is a single region for each pair of points that, when empty, causes an edge to exist between them. For Delaunay there are multiple regions any one of which, when empty, causes an edge to exist. For Urquhart, we have a combination of the Delaunay condition (on all pairs of points) and the RNG condition (limited to the third points of Delaunay triangles). —David Eppstein (talk) 19:49, 27 August 2022 (UTC)[reply]
          • Your rewording is a bit better, I think. Now the issue is that a set of points' "empty regions" is rather handwavy because it's not clear that they are rigorously constructed in each case. How about: "Certain geometric graphs defined from points, and empty regions constructed from those points, must contain all Euclidean minimum spanning trees as subgraphs." (emphasis mine) That avoids an extra quantifier. Ovinus (talk) 20:49, 27 August 2022 (UTC)[reply]
            • I don't understand your issue. We have an introductory sentence saying that certain geometric graphs related to EMSTs by having definitions based on empty regions. Why would that sentence cause one to worry that maybe these graphs' definitions are unrigorous? Would your objection still apply to a sentence that replaced "defined" by "rigorously defined" or "fully rigorously defined" or maybe "defined in full mathematical rigor"? What extra information would these adverbs convey to the reader? —David Eppstein (talk) 21:12, 27 August 2022 (UTC)[reply]
              • As fully rigorous, precise, exact, pedantic as possible! I think I see the problem now; when parsing the sentence it says “graphs defined from points and their empty regions” and it sounds like each POINT has some number of empty regions, which is absurd. It should be “graphs defined from a set of points and its empty regions”.
              • What you have now is a bit better, but I still think saying "all EMSTs are subgraphs" makes more sense than "all edges of all ...". Ovinus (talk) 03:46, 28 August 2022 (UTC)[reply]
                • We do say they are subgraphs after the bullet points. But these inclusions are proved one edge at a time, so I think it's important to start by focusing on the edges. Also, this makes it easier to make it clear that they do not merely contain one EMST as a subgraph; they contain all EMSTS, for point sets that have more than one. —David Eppstein (talk) 17:51, 29 August 2022 (UTC)[reply]
  • Reorder the graphs in Supergraphs to enjoy the inclusion chain order Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • Is "EMST" too informal to be used throughout the article? Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
    • Informal is the wrong word. It's not too informal to be used in many published papers, where the standard of writing is also very formal. What it is, that is more problematic for Wikipedia than elsewhere, is jargony. But the alternative is repeating "Euclidean minimum spanning tree Euclidean minimum spanning tree Euclidean minimum spanning tree" all over the article, and that's a long enough phrase to get kind of tedious. —David Eppstein (talk) 03:20, 27 August 2022 (UTC)[reply]
      • Indeed, which I why I thought you could use it more liberally in the article. It's never used besides its mention as an acronym. Ovinus (talk) 07:20, 27 August 2022 (UTC)[reply]
        • I'm not a big fan of the overuse of jargony acronyms in academic publication. I tried compromising be removing many instances of the word "Euclidean", and adding a note in the definitions section about removing that word when the context is obvious (sourced to several of the existing sources that do that). —David Eppstein (talk) 03:18, 28 August 2022 (UTC)[reply]
  • "or any other shape of constant diameter" Hey, throwback. But a square isn't such a shape, is it? Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "Another way of interpreting these results is that the average edge length for points in a unit square is O(1/sqrt(n)), proportional to the spacing of points in a regular grid, and that for random points in a unit square the average length is proportional to 1/sqrt(n)." Isn't this saying the same thing twice Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "That is, with high probability, some edges of length proportional ..." Is this restatement necessary? I find it more difficult to parse than the original, mainly bc of the "required" part, making it sound like you're talking about graphs in general (that is, any graph spanning the points will have (with high probability) an edge of length O(sqrt(log n / n))). Or—if that is indeed the case—that should be made clear in the original statement. Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "The constant of proportionality for this longest edge length, and its distribution around its expected value, are also known" Perhaps the distribution is too much info, but the constant seems like succulent info. Is it algebraic? No closed form? (If the latter I'd say "although there is no closed form") Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • Link "which can be done" to Delaunay triangulation#Algorithms? Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
    • I'm not happy with the state of that section. The only part where it clearly states that deterministic n log n is possible is the divide & conquer section, but then it immediately and confusingly talks about getting expected time n log n without hurting worst case performance, which makes no sense for an algorithm that is already deterministic n log n. It doesn't mention using Fortune's algorithm and then dualizing. It doesn't state that the incremental method can be derandomized. It doesn't mention Shewchuk's fast practical implementation of many of these methods. I'm not convinced that it's a useful enough resource for this topic to link in this way. —David Eppstein (talk) 20:18, 27 August 2022 (UTC)[reply]
  • "the version of the traveling salesman problem on a set of points in the plane with edges labelled by their length" Necessary clarification? If someone knows what TSP is they'll know what the Euclidean TSP is Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]
  • "some trees may require a bounding box of exponential size" This is cool. So also the edges may be exponential in size? I would note this Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]

Good stuff. Since this is a long (and important) one I'll probably do another read through when my mind is fresh. Ovinus (talk) 18:15, 24 August 2022 (UTC)[reply]

Second pass

[edit]

Continuing in a couple days. Ovinus (talk) 03:41, 31 August 2022 (UTC)[reply]

Ok, no rush. —David Eppstein (talk) 07:05, 31 August 2022 (UTC)[reply]

To clarify what I meant in my last comment, I think the lead should say that O(n log n) is optimal for 2D, if it later says that the optimal algorithm for d >= 3 is unknown. Also noting that I haven't forgotten the review. Ovinus (talk) 04:05, 8 September 2022 (UTC)[reply]

Quite happy with the article. Illustrations are pleasing (of course a 3D example would be cool—might try my hand at one). Will pass once we discuss the remaining points. Ovinus (talk) 07:37, 11 September 2022 (UTC)[reply]

Cool. Passing; nice work! Ovinus (talk) 03:11, 12 September 2022 (UTC)[reply]