Wikipedia:Bots/Requests for approval/Monkbot 1
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Trappist the monk (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 14:49, Saturday January 4, 2014 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AutoWikiBrowser
Source code available: User:Monkbot/CS1 deprecated parameters (AWB)
Function overview: Concatenate values from individual and adjacent Citation Style 1 template parameters: |date=
or |day=
with |month=
and |year=
into a new |date=
. Replace the source parameters with the single |date=
parameter:
{{cite web |... |year=2013 |day=14 |month=June |...}}
→{{cite web |... |date=14 June 2013 |...}}
Links to relevant discussions (where appropriate): Help talk:Citation Style 1/Archive 4#Deprecated month parameter AWB script
Edit period(s): In bursts
Estimated number of pages affected: The bot will be run through the pages listed at Category:Pages containing cite templates with deprecated parameters which at the time of this request contained 163,762 pages.
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: Citation Style 1 templates utilize either the wiki-markup {{citation/core}}
or the newer Lua Module:Citation/CS1 engines to render individual citations in a consistent manner. This script does not modify templates that use {{citation/core}}
because {{citation/core}}
does not support |date=
parameters with a CITEREF disambiguator.
As I understand it, the parameters |day=
, |month=
and |year=
were created to overcome limitations in the MediaWiki #time
function. The specific reasons are somewhat hazy. Whatever the problem with #time
, it has been resolved rendering the parameters |day=
and |month=
unnecessary. Parameter |day=
has been deprecated for quite some time and |month=
is recently deprecated – both, because they are no longer required to serve their original intended purpose. The parameter |year=
is still required for those CS1 {{citation/core}}
-based templates that are used with short form citations that use {{sfn}}
and the {{harv}}
family of templates.
This script mimics the actions taken by the various CS1 templates that use {{citation/core}}
and by Module:Citation/CS1. In all of these cases, the values from |day=
, |month=
, and |year=
are concatenated into a WP:DATESNO compliant dmy format date which is then used for display. Often, CS1 citations contain |date=
, |month=
, and |year=
where |date=
is a 1- or 2-digit day number. I suspect that this is caused by the {{cite journal}}
template as produced by the enhanced editing toolbar – editors fill in the month, year and date fields assuming that date means day. When |date=
is present and has a value, {{citation/core}}
and Module:Citation/CS1 use that value for the citation's rendered date and ignore |month=
and |year=
. When |date=
contains a 1- or 2-digit number, that is the displayed date.
Monkbot task 1 looks for Module:Citation/CS1-based templates that have adjacent (in any order):
|date=
and|month=
and|year=
|day=
and|month=
and|year=
|month=
and|year=
The individual parameters are further constrained:
|date=
and|day=
must be a 1- or 2-digit number;|month=
may be a single month, season, or gibberish text – the content is not evaluated except to determine if:|month=
represents a range of months or seasons where the two members of the range are separated by spaced or unspaced hyphen, solidus, endash, or the html entity–
, or,|month=
contains a leading or trailing 1- or 2-digit day number – where this occurs the day number is extracted and, with the month text, concatenated with the content of|year=
;
|year=
must be a 3- or 4-digit number with or without a single lowercase alpha character for use as a CITEREF disambiguator to be used with short form referencing templates{{sfn}}
and the{{harv}}
family.
The script does not not check for spelling, capitalization, or for rational dates: |date=99
|month=Nosuchmonth
|year=2525
produces |date=99 Nosuchmonth 2525
. It is anticipated that the script will create |date=
values that have improper format, spelling, punctuation, capitalization, etc. These malformed dates are most likely the result of malformed original data and not flaws in the script. Such errors are detectable by Module:Citation/CS1 and will be added to Category:CS1 errors: dates. There are other bots that operate on the pages listed there and which are designed to make appropriate repairs (see BattyBot task 25).
It is not anticipated that this bot will do general fixes.
Discussion
[edit]"The script does not not check for spelling, capitalization, or for rational dates." It seems pretty straight-forward to check for those (unless you are using just AWB search-replace, but even then some clever regex). So the bot can exclude things like |month=December author=John
or |day=2002
or even |month=December <!--Do not place into date, see talk page-->
. In many cases, it becomes harder to look for these once you merge them. I expect (i.e. have encountered with bot work) a lot of these, especially from 160k pages. — HELLKNOWZ ▎TALK 15:04, 4 January 2014 (UTC)[reply]
- The script is an AWB regex find and replace.
- Re:
|month=December author=John
The script produces this (presuming that|year=YYYY
precedes|date=
):Script now ignores citations like this.|date=December YYYYauthor=John
– the new|date=
parameter is no more broken than it was before; the citation no longer causes the page to be part of Category:Pages containing cite templates with deprecated parameters.
- Re:
|day=2002
: If the parameter order is|year=
|day=
|month=
or|month=
|day=
|year=
nothing changes because|month=
and|year=
are not adjacent to each other and the 4-digit|day=
value causes the match to fail.
In the other four cases, dmy, myd, ymd, dym,|month=
and|year=
are adjacent so other regex patterns intended for templates with only|month=
and|year=
match those parameters and ignore|day=
. The script produces this (assuming|month=Month
and|year=YYYY
):|day=2002 |month=Month |year=YYYY
→|day=2002 |date=Month YYYY
– same when source|month=
and|year=
are transposed|month=Month |year=YYYY |day=2002
→|date=Month YYYY |day=2002
– same when source|month=
and|year=
are transposed
- The script ignores citations that contain
|year=
,|month=
, and|day=
or|day=
but failed a match because|day=
/|date=
wasn't 1 or 2 digits are ignored.
- Re:
|month=December <!--Do not place into date, see talk page-->
: Ignored when|month=
precedes|year=
because the extraneous text is not expected. When|year=
precedes|month=
the script produces this (assumes|year=YYYY
):Script now ignores citations like this.|date=December YYYY<!--Do not place into date, see talk page-->
– the intent of the extraneous text is lost
- I have had no success in concocting a regex pattern that would prevent a match when
|month=
contains extraneous text. If there is a way and someone out there knows what it is, please share.
- Is this from a real citation? I can think of no reason why
|month=
should not be part of|date=
. Module:Citation/CS1 and all of the remaining CS1 templates that use{{citation/core}}
concatenate the content of|month=
and|year=
to create the displayed date.
- So you are not doing any kind of field checking? What if there is a
|date=
already, or what if there are several|year=
fields, or fields just aren't next to each other? Personally, I don't think AWB+Regex is the right tool for this. — HELLKNOWZ ▎TALK 20:57, 4 January 2014 (UTC)[reply]- @Trappist the monk: Try changing the end of your find statement from
\s*(\|?[^}]*)
to(\s*[\|}<])
- I believe this will skip citations with extraneous text as in the example above. I also suggest you use an edit summary that provides a link where editors who don't know what "CS1 deprecated date parameter errors" are could get more information, such as "Fix CS1 deprecated date parameter errors". - @Hellknowz: Looking at the code, if the fields aren't next to each other, it appears the bot wouldn't change it. GoingBatty (talk) 23:04, 4 January 2014 (UTC)[reply]
- @Trappist the monk: Try changing the end of your find statement from
- So you are not doing any kind of field checking? What if there is a
- Changed the edit summary. Your suggested fix doesn't solve the problem. I think that what wants to happen is for everything between the equal sign that follows the parameter label and the next pipe symbol (less leading and trailing white space) should be captured. There is an exception. When something enclosed in html remark tags follows the "month/season" text, the entire match should fail and the script should ignore the citation.
... |month = MonthText some other stuff |...
→ the capture is:MonthText some other stuff
... |month = MonthText <!-- hidden comment --> |...
→ should fail to match so that the script does nothing with this citation
- Changed the edit summary. Your suggested fix doesn't solve the problem. I think that what wants to happen is for everything between the equal sign that follows the parameter label and the next pipe symbol (less leading and trailing white space) should be captured. There is an exception. When something enclosed in html remark tags follows the "month/season" text, the entire match should fail and the script should ignore the citation.
- The purpose of capturing everything between the = and | (less leading and trailing white space) is to keep parts of a month together if they should have gotten separated somehow:
|month=Dec ember
.
- The purpose of capturing everything between the = and | (less leading and trailing white space) is to keep parts of a month together if they should have gotten separated somehow:
- I have not noodled this out. Surely there is a way to do it.
- —Trappist the monk (talk) 19:39, 5 January 2014 (UTC)[reply]
- @Trappist the monk: - OK, load User:GoingBatty/Monkbot settings and try the rule marked "GB ydm cite xxx" on User:GoingBatty/Monkbot tests. GoingBatty (talk) 23:16, 5 January 2014 (UTC)[reply]
- —Trappist the monk (talk) 19:39, 5 January 2014 (UTC)[reply]
- Ding! Ding! Ding! I was just beginning to wonder about what word boundaries (
\b
) meant and if it could be used to solve this problem and here you are with the answer. I changed the capture([A-Za-z\s]+\.?)\b
to([A-Za-z\s]+\b\.?)
so that full stops in the|month=
value would be copied into|date=
. It could probably be left as you did it so that BattyBot 25 wouldn't need to repair that citation.
- Ding! Ding! Ding! I was just beginning to wonder about what word boundaries (
- I have since made 200+ supervised edits with the new script.
- Tweaked to replace hyphen, solidus, html
–
entity in month ranges with endash. Also, when abbreviated months are followed by a terminal period, the period is removed.
- Tweaked to replace hyphen, solidus, html
- —Trappist the monk (talk) 16:27, 8 January 2014 (UTC)[reply]
- I have checked 50 or so of these supervised edits. I found no errors and no cause for concern. It appears to do what it says on the tin. If it merges parameters that result in an invalid date, BattyBot task 25 or a human editor will clean it up. – Jonesey95 (talk) 14:18, 10 January 2014 (UTC)[reply]
- Leaving things for other editors/bots to fix is something we don't approve unless there are special circumstances. — HELLKNOWZ ▎TALK 14:27, 10 January 2014 (UTC)[reply]
- I will rephrase in an attempt at being more clear: This bot does not appear to create new errors. If there is already an invalid date, this bot will not fix that error. It fixes only the deprecated parameter error, which allows it to be a focused bot with limited complexity (i.e. it has a lower chance of unexpected and undesired output). Fixing invalid dates is the purview of a bot that is already approved and active. – Jonesey95 (talk) 18:22, 10 January 2014 (UTC)[reply]
- Leaving things for other editors/bots to fix is something we don't approve unless there are special circumstances. — HELLKNOWZ ▎TALK 14:27, 10 January 2014 (UTC)[reply]
- I have checked 50 or so of these supervised edits. I found no errors and no cause for concern. It appears to do what it says on the tin. If it merges parameters that result in an invalid date, BattyBot task 25 or a human editor will clean it up. – Jonesey95 (talk) 14:18, 10 January 2014 (UTC)[reply]
- —Trappist the monk (talk) 16:27, 8 January 2014 (UTC)[reply]
Approved for trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. — HELLKNOWZ ▎TALK 14:27, 10 January 2014 (UTC)[reply]
Comment: I believe that this bot should operate only in the Article namespace, at least at first. I am new to BRFA and don't see a standard header for the BRFA request form that asks about namespaces. Is it assumed that all new bots will operate only in the Article namespace? What is the right venue for this question (I assume it's not this page)? Thanks. – Jonesey95 (talk) 21:28, 10 January 2014 (UTC)[reply]
- We usually assume it is article space. There is no syntax guide for any other space and they might have examples, tests, etc. that have nothing to do with article usage. May be the "number of pages affected" should really be just "pages affected" for namespaces and estimates. — HELLKNOWZ ▎TALK 21:39, 10 January 2014 (UTC)[reply]
- Module:Citation/CS1 excludes several different namespaces from Category:Pages containing cite templates with deprecated parameters which is the list of pages that Monkbot task 1 will work on. The list of excluded namespaces is at the top of Module:Citation/CS1/Configuration in the table
citation_config.uncategorized_namespaces
.
Bot trial results
[edit]The bot has completed 200 edits. I checked the diffs for all of them. Here is what I observed:
- I saw zero cases in which the bot made an erroneous edit.
- The bot is able to detect (and combine with
|year=
to make a valid|date=
) month names, season names, and month ranges like "March–April". - The bot preserves the original editor's version of valid month names and ranges. If the original month value is a valid abbreviated month like "Sep", that is preserved and combined with
|year=
to result in a|date=
parameter with the same format as the original citation. The bot fixes minor problems that caused the original month values to result in CS1 date errors, thereby fixing two errors with one edit. - The bot edited at a rate of exactly 100 edits per hour for the first 100 edits, then at about 200 edits per hour for the second hundred edits.
I see no problems. Other editors may see something that I missed. – Jonesey95 (talk) 23:30, 11 January 2014 (UTC)[reply]
Trial complete. Special:Contributions/Monkbot which see.
Editor Jonesey95 is quick, ne? Those extra reliable eyes are much appreciated. Thanks for giving it a look.
I did not find any improper edits. I did, however, find a weakness in the script that allowed fixable citations to go unfixed. Cite note 8 should have been fixed with this edit. That weakness has been fixed and the citation repaired by the script with this edit.
Another weakness that I've observed is that the script doesn't recognize redirect CS1 names: {{cite manual}}
is a redirect to {{cite book}}
but it wasn't repaired. I'll research and add those names to the script.
—Trappist the monk (talk) 01:52, 12 January 2014 (UTC)[reply]
- Ok, I'm not going to be adding CS1 redirects,
{{cite web}}
, for example, has 23 redirects,{{cite book}}
has 21 redirects, etc. Better to leave Monkbot task 1 as it is.
- —Trappist the monk (talk) 12:48, 12 January 2014 (UTC)[reply]
- @Trappist the monk: If you turn on AWB's general fixes, that will also enable AWB's Template redirects functionality, which will convert those redirects for you. You could then set up your find & replace rules to run after general fixes (see Wikipedia:AutoWikiBrowser/Order of procedures). For example, try Lycoming ALF 502 with and without general fixes on. GoingBatty (talk) 15:50, 12 January 2014 (UTC)[reply]
- Thanks for that. But, because I am responsible for every change that Monkbot makes, I choose to not take responsibility for code someone else has developed. And, while this trial is ongoing, verification of Monkbot is much easier when the only changes in a page are those made by Monkbot and not hidden amonst those made by AWB general fixes.
For reference:
|month=Sep |year=2000
becomes|date=Sep 2000
|month=July/August |year=2000
becomes|date=July–August 2000
- No whitespace around fields is preserved
Not saying these are issues, just pointing out. — HELLKNOWZ ▎TALK 15:12, 12 January 2014 (UTC)[reply]
- Correct. The regex does not capture the pattern
\s*=\s*
between the parameter identifier and the parameter value – there are two or three of those that could be captured; which one should it be?
- Ideally, all of them. But we have not required this (mostly). — HELLKNOWZ ▎TALK 19:34, 12 January 2014 (UTC)[reply]
Approved for extended trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Can you please run it on 100 random pages from the category, not the first ones, which here ended up being the same groups -- almost all are to genuses or chemicals/drugs which all have almost the same syntax. — HELLKNOWZ ▎TALK 15:12, 12 January 2014 (UTC)[reply]
- Trial complete. Special:Contributions/Monkbot which see.
- I made a list of about a thousand pages from various locations in Category:Pages containing cite templates with deprecated parameters. That was much more than I needed. Still, perhaps what Monkbot edited is sufficiently random. I found no errors, nor anything untoward.
- —Trappist the monk (talk) 18:25, 12 January 2014 (UTC)[reply]
- I checked all 100 of these edits and found zero erroneous edits. Nice work. – Jonesey95 (talk) 18:39, 12 January 2014 (UTC)[reply]
Approved. All edits checked, no issues. — HELLKNOWZ ▎TALK 19:34, 12 January 2014 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.