Wikipedia:Bots/Requests for approval/Monkbot 18
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Approved.
Operator: Trappist the monk (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 19:41, Saturday, November 14, 2020 (UTC)
Function overview: A cosmetic bot to cleanup cs1|2 citation templates
Automatic, Supervised, or Manual: automatic
Programming language(s): awb/c#
Source code available: User:Monkbot/task 18: cosmetic cs1 template cleanup
Links to relevant discussions (where appropriate): The idea for this bot task originates at Wikipedia:Village pump (proposals)#Cosmetic Bot Day (CBD) which may or may not be relevant
Edit period(s): continuous until no-longer needed
Estimated number of pages affected: something between 2.5 million and 4.6 million
Namespace(s): mainspace
Exclusion compliant (Yes/No): yes
Function details: Fully described at User:Monkbot/task 18: cosmetic cs1 template cleanup
Discussion
[edit]Here are a pair of test edits made in my sandbox. For these tests, the bot's normal namespace restriction was lifted:
- this edit is to a copy of this version of Autism
- task 18 edit diff – nobots restriction disabled for this test
- this edit is to a copy of this version of Japan
—Trappist the monk (talk) 19:41, 14 November 2020 (UTC)[reply]
- I'd be in support of this regardless off how the CBD discussion is closed. Some of the sub-tasks such as convert_language_names_to_codes are more than cosmetic as it helps other wikis to translate our articles more easily.
- Trappist the monk, have the tools been updated (Reftoolbar, Citoid, reFill, citation bot, etc) so that they use the non-deprecated parameters? – SD0001 (talk) 06:26, 15 November 2020 (UTC)[reply]
- MediaWiki talk:RefToolbarConfig.js § normalize parameter names. Citiod? Do you mean VE? VE, I think, uses citoid to get parameter values but parameter names come from TemplateData. I won't touch TemplateData. reFill does not have anyone who maintains it; were such tools subject to the same strictures as any simple bot, it would have, should have, been blocked long ago. Don't know about Citation bot but it is actively maintained so getting it updated isn't likely to be a problem.
- —Trappist the monk (talk) 11:13, 15 November 2020 (UTC)[reply]
- I think this task should wait until all the tools (or at least the maintained ones) have been updated to use the hyphenated parameter names. Otherwise there would continue to be a constant influx of the parameters the bot was "cleaning up". – SD0001 (talk) 14:28, 16 November 2020 (UTC)[reply]
- User talk:Citation bot/Archive 23 § parameter naming. I have conducted a brief experiment with VE. In that brief experiment, ve chose the hyphenated form – creating cs1|2 citations with ve is amazingly painful for more than the simplest of citation needs.
all the tools
Do you have a list ofall
the tools that you think should be updated?- —Trappist the monk (talk)
15:13, 16 November 2020 (UTC)14:41, 20 November 2020 (UTC) (update link)[reply]- The four mentioned above + IABot + ProveIt. From what I've seen, IABot is only using the hyphenated forms, and you've had citation bot and RefToolbar taken care of. The templatedata-based tools (VE, 2017WE and ProveIt) are also adding the hyphenated forms for common parameters like access-date and archive-* but not for exotic ones like
|laysource=
– should be easy enough to fix – I'll try to take care of them. And since reFill is unmaintained, I think we're largely good to go. – SD0001 (talk) 16:28, 17 November 2020 (UTC)[reply]
- The four mentioned above + IABot + ProveIt. From what I've seen, IABot is only using the hyphenated forms, and you've had citation bot and RefToolbar taken care of. The templatedata-based tools (VE, 2017WE and ProveIt) are also adding the hyphenated forms for common parameters like access-date and archive-* but not for exotic ones like
- I think this task should wait until all the tools (or at least the maintained ones) have been updated to use the hyphenated parameter names. Otherwise there would continue to be a constant influx of the parameters the bot was "cleaning up". – SD0001 (talk) 14:28, 16 November 2020 (UTC)[reply]
A few things
- I'm not sure it's a good idea to cover everything that's... 'canonized' for a lack of better word (e.g.
|laysource=
→|lay-source=
). For example, if converting|language=French
→|language=fr
is worth an edit because it helps translation efforts, then sure, you could also do|laysource=
→|lay-source=
and all sorts of general cosmetic maintenance edits. Or do runs against parameters that are about to be deprecated (i.e. in a shortly upcoming next CS1/2 update). I'm may or may not be against this in the long term, but prioritizing the bot's work matters, and I feel this is one task where the bot will annoy people for very little return in terms of code clarity (e.g.|laysource=
vs|lay-source=
is not particularly obscure/confusing here). - If feel that, if approved, a task like this should have AWB genfixes bundled into it, given the amounts of edits considered.
- Concerning the deletion of comments, a fairly common case is
{{cite journal <!-- Comment --> |last= |first=...}}
used to stop citation bot from touching that citation. Those comments should be retained, regardless of what happens to the other comments.
Headbomb {t · c · p · b} 23:51, 15 November 2020 (UTC)[reply]
- Concerning #1, maybe what needs to be done here is to tackle the table in order of reverse count. I.e. do the small counts first, then move up to the large ones, and save the four big ones for last (authorlink/archiveurl/archivedate/accessdate). This would in effect be a good 'trial by fire' of ~85,000 edits to see what the community reaction is before unleashing the bot on a run of ~2.5-4.7 million edits. Headbomb {t · c · p · b} 00:09, 16 November 2020 (UTC)[reply]
- Could do that though I'm skeptical that avoiding
|accessdate=
,|archivedate=
,|archiveurl=
, and|authorlinkn=
/|authornlink=
until all of the other all-run-together parameter names are done will somehow make the community more accepting; a cosmetic edit is still a cosmetic edit. - —Trappist the monk (talk) 00:38, 16 November 2020 (UTC)[reply]
- I think what Headbomb means is that the list of pages to edit would start with the smallest-use parameters and work "up" through the table; all relevant parameters would be changed on those pages, but it would mean the bot isn't starting with "every page using
|accessdate=
". Primefac (talk) 01:23, 16 November 2020 (UTC)[reply]- That's what I mean yes. Same for tackling the other table. Headbomb {t · c · p · b} 04:46, 16 November 2020 (UTC)[reply]
- Isn't that really a distinction without a difference? There is a high degree of overlap. For example, these searches exclude
|accessdate=
: - It is not my plan to use the searches in the tables as sources for articles. I could, but I had rather thought to start with Category:CS1 errors: empty unknown parameters and then do the articles in Category:Featured articles and Category:Featured lists; after that not yet decided. No matter where one 'starts',
|accessdate=
will be included in the edits unless task 18 is specifically configured to exclude it. From the outside, editors will see that the predominant parameter renaming will be|accessdate=
. - —Trappist the monk (talk) 12:26, 16 November 2020 (UTC)[reply]
- I think what Headbomb means is that the list of pages to edit would start with the smallest-use parameters and work "up" through the table; all relevant parameters would be changed on those pages, but it would mean the bot isn't starting with "every page using
- Could do that though I'm skeptical that avoiding
- 2. No Monkbot task has never done genfixes because I do not want to be responsible for edits made by code that I did not write. Task 18 shall not do general fixes.
- 3. Task 18 does not remove the kind of html comments you describe. Here is a diff from my sandbox where task 18 removed the empty
|coauthors=
but left the comment (it was your example with additional stuff to show that the bot did something). - —Trappist the monk (talk) 00:38, 16 November 2020 (UTC)[reply]
- Re #1 it's more a matter of having a sort of pause/breathing room with an 'alright, after ~85,000 or so edits, let's take a pause and evaluate if we really want to unleash the bot on another 2-4 million articles'. For #2, I still feel genfixes should be bundled in, but that's still operator discretion right now. The community may feel differently if they're giving a mandate to do 2-4 million edits. It may want to maximize the value of those edits if edits are to be done in the first place. Headbomb {t · c · p · b} 01:03, 16 November 2020 (UTC)[reply]
- For what it's worth, given how many citations are on any given page, I think it would be prudent to not have genfixes, since it will muddy the waters about what's actually being changed. I see where you're coming from, but I think simpler is better for this one. Primefac (talk) 01:44, 16 November 2020 (UTC)[reply]
- Re #1 it's more a matter of having a sort of pause/breathing room with an 'alright, after ~85,000 or so edits, let's take a pause and evaluate if we really want to unleash the bot on another 2-4 million articles'. For #2, I still feel genfixes should be bundled in, but that's still operator discretion right now. The community may feel differently if they're giving a mandate to do 2-4 million edits. It may want to maximize the value of those edits if edits are to be done in the first place. Headbomb {t · c · p · b} 01:03, 16 November 2020 (UTC)[reply]
- Seems like a solid proposal, especially if the general CBD proposal is approved. I also agree that having genfixes with this, while nice, is too much considering the amount of citation and probable changes on a page. I would suggest another edit be done and that is use non-redirect template names where possible, so {{Cite website}}, {{Cite url}} (and even {{یادکرد وب}}) get converted to {{Cite web}}. --Gonnym (talk) 12:39, 17 November 2020 (UTC)[reply]
- I'm afraid that I have to oppose any bot whose sole task is changing the names of parameters in templates that are not broken. The "laysource" to "lay-source" example given is actually the perfect example for a task that will create watchlist and page history clutter without giving any significant benefit to either readers or editors. It's tasks like this that is one of the reason the CBD proposal is controversial. Thryduulf (talk) 13:51, 20 November 2020 (UTC)[reply]
- Yes, edits made by this bot task will be recorded on watchlists (where, because they are bot edits, they can be hidden until they get pushed off the bottom) and article histories (where, alas, they cannot be hidden). Still, non-hyphenated cs1|2 parameter names are going away (see the list of the most recent deprecations). The recently deprecated parameters were individually marked as deprecated so that, after 11 October 2020, any cs1|2 template that uses a deprecated parameter emits the Cite uses deprecated parameter
|<param>=
error message. Shortly after that, quite a few gnomes driving awb, wrote scripts to fix those parameter names and I wrote Monkbot 17 which also worked at fixing the deprecated parameters. Each and every one of those edits is recorded on watch lists and article histories. We can continue with repetitions of this sort of round-about method (because all of the remaining all-run-together parameter names will not be deprecated at the same time), or, we can give the task to a bot that will replace all of those multiple individual edits per article with a single edit that replaces all of the to-be-deprecated parameter names in one go. - —Trappist the monk (talk) 16:30, 20 November 2020 (UTC)[reply]
- Yes, edits made by this bot task will be recorded on watchlists (where, because they are bot edits, they can be hidden until they get pushed off the bottom) and article histories (where, alas, they cannot be hidden). Still, non-hyphenated cs1|2 parameter names are going away (see the list of the most recent deprecations). The recently deprecated parameters were individually marked as deprecated so that, after 11 October 2020, any cs1|2 template that uses a deprecated parameter emits the Cite uses deprecated parameter
- Have to agree with Thryduulf on this one, also removing the parameters will break many old versions of articles. Keith D (talk) 14:48, 20 November 2020 (UTC)[reply]
- We don't maintain old versions of articles, so while it may be annoying that an old version doesn't render as it did in the past, template are routinely deleted and updated, and that's got nothing to do with this bot's approval or non-approval. Headbomb {t · c · p · b} 14:54, 20 November 2020 (UTC)[reply]
- See Phab:T36244 for a request that will improve display of old article versions. Thryduulf (talk) 15:51, 20 November 2020 (UTC)[reply]
- Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This is essentially a "this template will be broken in the future, and we're trying to fix it now" fix, as well as standardization and prevention of future pages using the same (incorrect) formatting. I'm sending this to trial as proof-of-concept, but I think we need to also discuss rate limits (i.e. how many pages per day this will edit). That might ease some of the "spammed watchlist" concerns. Primefac (talk) 02:16, 25 November 2020 (UTC)[reply]
- Interesting on rate limits. Consider, for example, a 'conservative' number of 1 edit per minute (ie, 1440 per day). Assuming 3,000,000 articles need updating ("between 2.5 and 4.6 million") this would take 2083 days, or 6 years to complete. If we go with the meta/wikitech advice of 12 edits per minute, this takes 6 months to complete which is far more reasonable, but at almost 20,000 edits per day (or 1 year at 6 edits per minute). Give or take maxlag, and assuming this runs 24/7 (which, since AWB, I assume not). Frankly, I don't see this much worse than the 1440 edits per day figure, as far as "watchlist spam" goes, because I imagine those above would object equally to both 1440 and 10,000/20,000 per day. CBD does not seem, to me, like the answer either: say the bot does, heck, one edit per second. That's 86400 per day, or 34 CBDs to complete, or 3 years.
- Since deprecation runs are common & acceptable I think that's an argument here, buttt those finish their runs in days or a week or two, tops, I think; this would be pretty constant for 6mo, and its scope is such that it would affect pretty much every article rather than those in a particular category - which is, actually, an advantage: the odds of it bombing one particular category on any particular day seems low. Hence, I'd predict it's quite unlikely a single editor's watchlist, even if it isn't set to hide bot edits, shows more than a few of these bot edits per day. If we proceed here, the only way to get an answer to this, I think, is a ramp-up trial over, say, a month, for comments. ProcrastinatingReader (talk) 06:37, 25 November 2020 (UTC)[reply]
- I have wondered about
watchlist spam
so I wrote Module:Sandbox/trappist the monk/random sort which pseudo-randomly scrambles a nice orderly list so that similarly named articles that might all be on an editor's watchlist aren't all processed in sequence. For example, articles that use scientific names for flora an fauna ofttimes begin with the same word; ship names begin with the same prefix; sporting events begin with years. The module's talk page has an example – the raw data are in html comments and taken straight from awb's list. Show preview or save to get an new pseudo random scrambling. - —Trappist the monk (talk) 16:27, 25 November 2020 (UTC)[reply]
- I have wondered about
- Trial complete.
- See list of trial edits. I did not find anything untoward in these edits. The first three, Emma Louisa Turner, Fanno Creek, and Missing My Baby were the only FA articles that I found in Category:CS1 errors: empty unknown parameters. The rest of the articles edited were also in that category.
- —Trappist the monk (talk) 16:27, 25 November 2020 (UTC)[reply]
- Approved. For the record, this bot task is being done so that future improvements to the cs1|2 template family will not have to deal with improper parameters, old copies of deprecated parameters, and otherwise hacking together support for multiple outdated or soon-to-be-invalid parameters. The argument has been made in the past that by removing these templates/pages from tracking categories, there is a reasonable lightening of the load of human editors, and better allows for actual tracking of invalid or mistaken parameters (i.e. it's easier to clear a cat with 100 pages with errors than a cat with 1000 pages).
- The operator has a solid track record with these sorts of tasks, and has indicated a willingness to mitigate as much as possible the "watchlist overload" that is commonly cited as a reason for concern for this sort of task. That being said, they have the freedom to make minor changes to the task as issues or concerns are brought up. Primefac (talk) 18:25, 25 November 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.