Wikipedia:Bots/Requests for approval/PrimeBOT 19
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Primefac (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 05:38, Sunday, July 16, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AWB
Source code available: WP:AWB
Function overview: Remove duplicate categories on articles.
Links to relevant discussions (where appropriate): WP:AN discussion with no major opposition
Edit period(s): one time run (can be extended to "once a month" if necessary, i.e. if there are a large number of pages added each month)
Estimated number of pages affected: 801 (ish)
Namespace(s): Article
Exclusion compliant (Yes/No): Yes
Function details: AWB automatically removes any duplicate categories if:
- Both category texts are identical (e.g.
[[Category:Foo]]
and[[Category:Foo]]
) - Only one category has a sort key (e.g.
[[Category:Foo|Bar]]
and[[Category:Foo]]
This is, from a rough count, all but 20-30 of the pages listed at the CheckWiki dump for this task, the rest of which can be dealt with manually. Essentially, this is running AWB with only genfixes enabled. Edit summary will read Removing duplicate categories (CheckWiki #17) - BRFA
.
Discussion
[edit]Same with Wikipedia:Bots/Requests for approval/Yobot 56 that was filled some days ago. -- Magioladitis (talk) 08:01, 16 July 2017 (UTC)[reply]
- Despite not actually listing it properly, this is a duplicate task to 56. I'm fine withdrawing if it gets approved first. Primefac (talk) 12:40, 16 July 2017 (UTC)[reply]
- What is your strategy for resolving sort conflicts? — xaosflux Talk 13:40, 16 July 2017 (UTC)[reply]
- Sort conflicts are ignored by AWB, so there would be no change. I'll be sure to enable the skip options for when the category isn't changed at all (i.e. "skip if only whitespace" etc). Primefac (talk) 14:11, 16 July 2017 (UTC)[reply]
- Assuming no sort conflicts, this is expected to be 100% cosmetic for readers, correct? — xaosflux Talk 13:40, 16 July 2017 (UTC)[reply]
- Correct. I know this generally runs afoul of the cosmetic rules, but the AN discussion (sort of) determined that having duplicate categories could potentially mess up hotcat, as well as users changing/modifying/removing/etc categories from a page. Primefac (talk) 14:11, 16 July 2017 (UTC)[reply]
- I picked a random page from the checkwiki list, Berger Blanc Suisse. In looking at the page, the obvious cleanup needed was fairly easy to spot and resolve - but simply removing the duplicated category alone would not make this page any better for the readers. With this being such a small list might it be better to be curated manually? Same question to @Magioladitis: so I don't have to ask twice. It's possible I ended up in some edge case. — xaosflux Talk 14:44, 16 July 2017 (UTC)[reply]
- Xaosflux duplicated reflists are tracked as seperate task. Duplicated content is not that common though. I tend to find them occasionally.
- On the other issue: As with the ISBN thing, I am OK with Primefac's bbot also get approval. Recall, that we used to have 2-3 bots doing this before Bgwhite's dissapereance. -- Magioladitis (talk) 14:48, 16 July 2017 (UTC)[reply]
- I'm not really worried about that for this task - when we have multiple bots approved for the same task only real concern is collisions or inconsistency - don't think that is an issue here. — xaosflux Talk 14:54, 16 July 2017 (UTC)[reply]
- I'm starting to wonder if this task is even necessary. I checked the first dozen pages on the dump page and Fram had already been through and fixed all the duplicate cats. That, combined with your example above, makes me think this would be better (from a "fixing everything at once" perspective) as a manual task. The tools "live" version" shows only 700 pages. If the number is going down, there are clearly people aware of and actively fixing this issue. However, the spread of duplicate cats doesn't seem to affect any one "type" of article, so clearly bundling it with other similar AWB-worthy tasks doesn't make much sense. Primefac (talk) 15:01, 16 July 2017 (UTC)[reply]
- This fix is another that AWB just kind of "cleans up" without actually fixing the issue that there was an entire article stuffed into the {{multiple issues}} template. I'm starting to think the issues with this task are less about COSMETIC and more about CONTEXT. Primefac (talk) 15:19, 16 July 2017 (UTC)[reply]
- I'm not really worried about that for this task - when we have multiple bots approved for the same task only real concern is collisions or inconsistency - don't think that is an issue here. — xaosflux Talk 14:54, 16 July 2017 (UTC)[reply]
- Xaosflux, I am considering withdrawing this request. I spent the last hour or so cutting the list in half (i.e. 350ish edits, plus 90 already done by someone else), and probably 40 of those I had to also fix either duplicate Reference sections, bad code, poor formatting, etc. While AWB can do this task automatically, there are too many cases where a set of eyes making sure there isn't anything else wrong with the page would be better (especially since this is a rather cosmetic change). Primefac (talk) 17:06, 16 July 2017 (UTC)[reply]
- That was my initial thought, that these are caused by inexperienced editors who likely left additional errors not suited for automation. — xaosflux Talk 17:21, 16 July 2017 (UTC)[reply]
- Xaosflux Most of them are caused by Cydebot. If we fix Cydebot we are good. -- Magioladitis (talk) 19:17, 16 July 2017 (UTC)[reply]
- I find that hard to believe, given that I've just checked 50 pages where both Cydebot and I edited the page, and in not a single on of them did Cydebot add a category. Most were removing cats per a CFD result. Primefac (talk) 21:10, 16 July 2017 (UTC)[reply]
- @Magioladitis: interesting - but I'm not seeing the data to back up your statement - from Wikipedia:CHECKWIKI/WPC_017_dump I picked 4 random pages: Read with Me, Ronald de Boer, Sophie Totzauer, and Thiago Cunha. I didn't see any errors introduced by Cydebot, and there was only a Cydebot edit to one of them. Do you have any additional information that Cydebot causes most of these errors? — xaosflux Talk 22:45, 16 July 2017 (UTC)[reply]
- @Xaosflux: in 4/19/14 Cydebot created thousand of entries, same in 8/20/14. Example. I used to keep record for that. In 12/12/14 we had 1000 pages in one day. -- Magioladitis (talk) 03:13, 17 July 2017 (UTC)[reply]
- OK, so that was over 3 years ago - is this a current issue? — xaosflux Talk 04:03, 17 July 2017 (UTC)[reply]
- Xaosflux The are explosions that happen depending on the XfDs. Anyway, if the task should be manually I am still OK. If the task should not be done I am still OK. Using AWB for the task sometimes was giving the impression I was removing a valid category. If anyone does it they should be very cautious with the edit summary. -- Magioladitis (talk) 12:40, 17 July 2017 (UTC)[reply]
- @Xaosflux: in 4/19/14 Cydebot created thousand of entries, same in 8/20/14. Example. I used to keep record for that. In 12/12/14 we had 1000 pages in one day. -- Magioladitis (talk) 03:13, 17 July 2017 (UTC)[reply]
- I have no objection to removing straight duplicates - they are easily detectable and removed, so very suited for a bot. Given the numbers that Primefac gave at the discussion (here) I would say that 1/2-3/4 of pages in the list is enough for a bot to be worthwhile. However, with the edge cases I am wary of WP:CONTEXTBOT and would say that leave the rest to manual editors - they seem to be doing an ok job thus far. TheMagikCow (T) (C) 18:20, 17 July 2017 (UTC)[reply]
- I've just finished with all of the AWB-fixes-it-automatically pages. Magioladitis, how often does the dump page get updated, and how much (or less) reliable is it than the labs page? Primefac (talk) 20:19, 18 July 2017 (UTC)[reply]
- Primefac The dump page is 100% reliable (at the point of its creation) because it is based in the full database dump while the labs page may miss pages since it checks pages up to a certain number every 15 minutes. Due to the recent community disputes the dump page is not regularly updated anymore since the people working with it bcause inactive or semi-active. -- Magioladitis (talk) 05:46, 20 July 2017 (UTC)[reply]
- Okay, so it's basically useless after a few weeks, assuming that the pages have actually been edited. Live/labs page has 14 pages on it currently, though I'm sure it'll find more as time passes (I didn't clear out all of the dump page when I went through). Primefac (talk) 15:56, 20 July 2017 (UTC)[reply]
- Primefac The dump page is 100% reliable (at the point of its creation) because it is based in the full database dump while the labs page may miss pages since it checks pages up to a certain number every 15 minutes. Due to the recent community disputes the dump page is not regularly updated anymore since the people working with it bcause inactive or semi-active. -- Magioladitis (talk) 05:46, 20 July 2017 (UTC)[reply]
- "Live" page lists 36 pages, though undoubtedly there will be more as time goes on. If this task goes through, the bot op will still have to mark those pages as "done" on the tools page to make the list accurate.
- When I went through and cleared out the live list the other day, there were a ton of pages where there were genfixes other than cat changes. I cannot, however, figure out the combination of "skip" conditions that would make it so that if a cat change isn't implemented the page is skipped. Magioladitis may know. If it isn't possible to implement this skip condition, then I think it doesn't make sense to have this as an automated task (since it would literally be a pure "genfixes" bot. Primefac (talk) 14:46, 21 July 2017 (UTC)[reply]
Primefac Live page has 470 pages right now. -- Magioladitis (talk) 16:07, 12 August 2017 (UTC)[reply]
Primefac On the skip condition: One needs to ensure that MetaDataSort/RemoveCats is actually triggered using Custom Module. -- Magioladitis (talk) 16:42, 12 August 2017 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I see no reason to not move to trial with this. Please enable the 'skip if only minor genfixes/whitespace' skip conditions, unless this is considered a 'minor genfix'. Headbomb {t · c · p · b} 13:32, 20 August 2017 (UTC) [reply]
Extended content
|
---|
Headbomb please have a look to the Yobot equivalent. -- Magioladitis (talk) 18:16, 20 August 2017 (UTC)[reply]
|
Withdrawn by operator. per my initial opening statements. That, and WPC works better than AWB for this anyway. Primefac (talk) 17:34, 22 August 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.