Wikipedia:Bots/Requests for approval/WikiCleanerBot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 17:25, Monday, February 25, 2019 (UTC)
Function overview: To fix ISSN with an incorrect syntax. As described in ISSN#Code format, the correct syntax for an ISSN is "an eight digit code, divided by a hyphen into two four-digit numbers"
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (Wikipedia:WPCleaner)
Source code available: On Github
Links to relevant discussions (where appropriate): Maintenance task for CW Error #106
Edit period(s): At most, twice a month, following the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.
Estimated number of pages affected: Around a thousand At most a few hundred pages for the first complete run (pages with such problems are listed in Wikipedia:CHECKWIKI/WPC 106 dump, which currently contains a list of 1315 420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.
Namespace(s): Main namespace
Exclusion compliant (Yes/No): No, because there's no reason to use an incorrect syntax for an ISSN instead of the correct one.
Function details: Based on the list generated on Wikipedia:CHECKWIKI/WPC 106 dump, the bot will only fix trivial problems (like a missing hyphen in the ISSN number, extra whitespace characters...) and will leave the more complex ones to be fixed by a human. It will reduced a lot the list, so human editors can fix the remaining problems.
For the bot flag, I currently don't have it, and I would like to keep it that way (or if need be, only added temporarily for the first run).
Discussion
[edit]If you will be operating from the dump, could you not do a dry run outputting to Wikipedia:CHECKWIKI/WPC 106 dump so its handling of the pathological cases there can be inspected? --Xover (talk) 17:48, 25 February 2019 (UTC)[reply]
- Hi Xover. The dump analysis is performed independently and produces several analysis (Wikipedia:CHECKWIKI/WPC all), I would prefer to keep it separated from automatic fixing. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)[reply]
- But if you want to know which pages won't be fixed by the bot, I can do a dry run on my computer and give the list of fixed pages. --NicoV (Talk on frwiki) 18:06, 25 February 2019 (UTC)[reply]
- @NicoV: I was more interested in seeing the before→after list. Several of the instances listed in the WPC 106 dump looked like they would be hard to fix automatically, so if the output of a dry run could be inspected it might provide a priori confidence that the task won't mess anything up. A dry run might be more efficient / reduce the need for a trial period with live edits (but I speak only for myself: the BAG may see it differently). --Xover (talk) 18:24, 25 February 2019 (UTC)[reply]
- @Xover: Ok, I understand. I will see if I can do something. The idea is to fix only trivial cases automatically, the hard ones will be left to human editors, and I will check what the results are before doing an actual run. --NicoV (Talk on frwiki) 09:28, 26 February 2019 (UTC)[reply]
- @NicoV: I was more interested in seeing the before→after list. Several of the instances listed in the WPC 106 dump looked like they would be hard to fix automatically, so if the output of a dry run could be inspected it might provide a priori confidence that the task won't mess anything up. A dry run might be more efficient / reduce the need for a trial period with live edits (but I speak only for myself: the BAG may see it differently). --Xover (talk) 18:24, 25 February 2019 (UTC)[reply]
- But if you want to know which pages won't be fixed by the bot, I can do a dry run on my computer and give the list of fixed pages. --NicoV (Talk on frwiki) 18:06, 25 February 2019 (UTC)[reply]
Comment: The dump list appears to have some false positives on it. I picked one page at random, Pocket Dwellers, and there is an ISSN of 00062510 listed within a citation template. This ISSN is valid within a CS1 template; articles with invalid ISSNs are placed in Category:CS1 errors: ISSN. The template handles this unhyphenated ISSN format with no trouble, displaying properly with a hyphen. It should not be "corrected"; the bot would be making a cosmetic edit, leaving the rendered page unchanged. Perhaps the dump analysis should be corrected before this bot attempts to modify articles based on the list. – Jonesey95 (talk) 17:56, 25 February 2019 (UTC)[reply]
- Hi Jonesey95. On other wikis like frwiki, the templates don't add the hyphen by themselves. If ISSN without the missing hyphen have to be considered correct on enwiki for some templates, then I will first need to add an option in WPCleaner for this (and then generate again the page Wikipedia:CHECKWIKI/WPC 106 dump to check that false positives are removed) before implementing the automatic replacement. I will post here when this part is done. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)[reply]
- Thanks. It looks like {{ISSN}} does not add the hyphen, but the CS1 citation templates do so. Just to see if I had gotten unlucky, I picked four more articles at semi-random from the list, limiting my "random" choices to articles that were displaying eight digits as the erroneous string. All four articles: Acritogramma metaleuca, Capri (cigarette), David Mba, and Ensoniq VFX contain no ISSN errors. I believe that the dump analysis needs to be debugged before this task can be run. It is possibly telling that there are only 65 pages in the three ISSN error categories combined. – Jonesey95 (talk) 18:16, 25 February 2019 (UTC)[reply]
- Jonesey95. I've modified my code to allow telling WPCleaner that some templates automatically add the hyphen if it's missing, so the articles you mentionned won't be reported anymore. I'm currently running an update of Wikipedia:CHECKWIKI/WPC 106 dump to see what will be left. --NicoV (Talk on frwiki) 09:24, 26 February 2019 (UTC)[reply]
- Thanks. It looks like {{ISSN}} does not add the hyphen, but the CS1 citation templates do so. Just to see if I had gotten unlucky, I picked four more articles at semi-random from the list, limiting my "random" choices to articles that were displaying eight digits as the erroneous string. All four articles: Acritogramma metaleuca, Capri (cigarette), David Mba, and Ensoniq VFX contain no ISSN errors. I believe that the dump analysis needs to be debugged before this task can be run. It is possibly telling that there are only 65 pages in the three ISSN error categories combined. – Jonesey95 (talk) 18:16, 25 February 2019 (UTC)[reply]
Page Wikipedia:CHECKWIKI/WPC 106 dump has been updated to avoid reporting missing dash when the template automatically adds it to the displayed result, there are only 420 pages remaining compared to the 1315 initially. I could probably also remove reports for internal links to pages like ISSN 1175-5326 which exist, but even if they are reported, the bot won't fix anything there. With the current algorithm, a dry run modifies 115 pages on the 420.
--NicoV (Talk on frwiki) 12:36, 26 February 2019 (UTC)[reply]
- That list looks much more reasonable. There are still some weird ones in there, like You Are Happy, where
|issn=
was being used in a {{WorldCat}} template, which doesn't support that parameter. Also, it looks like dashes, as in Iran–Iraq War and The Mauritius Command and Resonant inductive coupling, are also silently converted to hyphens by CS1 templates, so those don't need to be fixed and should be removed from the WPCleaner report.- I can also add an option to ignore such cases where the dash is automatically replaced, like I did for the missing hyphen. But is it a good idea to keep incorrect syntax just because the template itself will fix it ?
- For the non-existing parameter in a {{Worldcat}}, I think I will leave it like that and a hyphen will be added, there are only a few pages like that. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)[reply]
- In a case like Tytthaspis sedecimpunctata, will the bot/script apply the ISSN template, making the ISSN actually useful, or will it just replace the dash with a hyphen? – Jonesey95 (talk) 13:23, 26 February 2019 (UTC)[reply]
- Currently, it will simply replace the dash with a hyphen, but I can add a feature to use a template instead. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)[reply]
- I think replacing a plain-text ISSN with a template is a good idea in nearly every case.
-
- I don't want to rain on your parade, but at this point, it looks like a periodic supervised AWB run, combined with a bit more tweaking of the WPCleaner report, might be the best option. The risk of cosmetic edits by the bot (and AWB, unless it is watched carefully) is high. With considerably fewer than 100 pages fixable by the proposed bot, a script may be better. If you still want to get this task bot-flagged in order to avoid cluttering people's watchlists, of course, I would support that. – Jonesey95 (talk) 14:04, 26 February 2019 (UTC)[reply]
- I will try several modifications to limit the number of false positives in the generated list (which is good in itself), and we'll see then what is the best course of action. --NicoV (Talk on frwiki) 16:38, 26 February 2019 (UTC)[reply]
- I don't want to rain on your parade, but at this point, it looks like a periodic supervised AWB run, combined with a bit more tweaking of the WPCleaner report, might be the best option. The risk of cosmetic edits by the bot (and AWB, unless it is watched carefully) is high. With considerably fewer than 100 pages fixable by the proposed bot, a script may be better. If you still want to get this task bot-flagged in order to avoid cluttering people's watchlists, of course, I would support that. – Jonesey95 (talk) 14:04, 26 February 2019 (UTC)[reply]
- Currently, it will simply replace the dash with a hyphen, but I can add a feature to use a template instead. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)[reply]
- Even if it means only fixing less than a 100 pages at the end, I'm still interested in running at least a test run. For the test run, if it's accepted, I will proceed one page at a time (after each modification, WPCleaner will ask me if it should proceed, so I will be able to check thoroughly before going to the next article). Running a script would be a good idea, but as no one is proposing to create it and run it (the list has been available for years), I think it's interesting to run WPCleaner on this. After the test run, we can still decide if it's interesting running it periodically or not. --NicoV (Talk on frwiki) 11:05, 23 March 2019 (UTC)[reply]
- {{BAG assistance needed}} : can I do a test run? As explained, after each modification, WPCleaner will ask me if it should proceed, so I will check each edit before letting it do the next one. If it's ok, tell me how many modifications I can make. --NicoV (Talk on frwiki) 13:21, 1 April 2019 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 20:32, 4 April 2019 (UTC)[reply]
Trial complete. Primefac I've done the 50 edits, they can be checked in this list. I've seen no problem in the edits. --NicoV (Talk on frwiki) 14:03, 8 May 2019 (UTC)[reply]
- @NicoV: So I'm a bit confused here on one part. This account currently does have the bot flag as a result of another task approval - why would you NOT want to flag repeatable minor edits as bot (thus flooding recent changes and watchlists unneccessarily?) Keep in mind, that the bot flag gives you access to the bot attribture on edits, but you don't have to assert it (if you are depending on someone else's framework you may not have the choice). If you don't want the bot flag for some tasks, but do for others - but you don't have the capability of controlling this in your requests - you will need to create a separate account. How do you want to deal with this? — xaosflux Talk 13:10, 15 May 2019 (UTC)[reply]
- @Xaosflux: When I submitted this request, I didn't think if I would submit others, and this one took a long time to do. I removed the message about the bot flag, it's ok if I run this task with the bot flag and tagging my edits as such. I will check my other tasks, especially Wikipedia:Bots/Requests for approval/WikiCleanerBot, to see if they should better be run without tagging the edits as bot: if so, I will manage it on my side. --NicoV (Talk on frwiki) 13:17, 15 May 2019 (UTC)[reply]
{{BAG assistance needed}} Any decision ? --NicoV (Talk on frwiki) 17:15, 12 June 2019 (UTC)[reply]
- Approved. Primefac (talk) 12:23, 15 June 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.