Wikipedia:Bots/Requests for approval/BHGbot 7
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: BrownHairedGirl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 15:10, Tuesday, July 28, 2020 (UTC)
Function overview: Mass create {{Category redirect}}s to resolve the WP:ENGVAR variations in category names using the word "organisation(s)" or "organization(s)".
e.g. if we have a Category:Anti-Foobar organizations, then the page Category:Anti-Foobar organisations would be created with the content {{Category redirect|Anti-Foobar organizations|bot=BHGbot}}
Automatic, Supervised, or Manual: Automatic
Programming language(s): Bash and AutoWikiBrowser
Source code available: Yes. There are two components:
Links to relevant discussions (where appropriate): WT:WikiProject Categories#Organi[SZ]ations_category_redirects ([SZations_category_redirects permalink], tho discussion is ongoing). This discussion was notified to WP:VPP[1] and WP:VPR[2].
Previous related discussion: WP:Bots/Requests for approval/BHGbot 3 (a similar proposal in 2017, which ran into the sands due to lack of prior consensus. My bad)
Edit period(s): Initial run to handle the backlog. Then a followup every few months.
Estimated number of pages affected: ~12,500 in the initial run.
Namespace(s): Category
Exclusion compliant (Yes/No): Yes
Function details: This task supports MOS:COMMONALITY by resolving the s/z WP:ENGVAR variation in the spelling of "organisation"/"organization", by creating a soft {{category redirect}} to the title which is in use. This corresponds with the MOS:COMMONALITY guideline to create such redirects in article space.
- The word "organisation"/"organization" is one of the most common ENGVAR variants in category titles, and the current lack of redirects is a long-standing nuisance for both readers and editors.
- The bot works in three stages:
- A set of quarry queries to generate lists of pages
- A bash script to process these lists and generate a list of category redirect titles to be created
- An AWB run to create the category redirect pages
- 1. Get lists
- The first part of the bot is three quarry queries:
- quarry:query/46899: Gets a list of non-redirect category pages whose title matches
\b[Oo]rgani[sz]ations?\b
and don't transclude {{Category redirect}} or {{Category disambiguation}} - quarry:query/46999: gets a list of all pages in the category namespace
- quarry:query/47001: gets a list of all pages in the main (article) namespace
- quarry:query/46899: Gets a list of non-redirect category pages whose title matches
- 2 process the lists
- The bash script Make-BHGbot7-edit-list.sh:
- inverts the S/Z spelling in the list of organisation categories
- removes from that list titles which are in the list of all pages in the category namespace
- removes from that list titles which are in the list of all pages in the main (article) namespace
- wikilinks the resulting edit list
- 3 Create the redirects
- Using the edit list created in step 2, AWB
- skips any existing pages (there should be none, but some may have been created since the list was made)
- applies the AWB custom module BHGbot-7-AWB-module to create the redirect with an explanatory edit summary as in this test edit[3]
- If the page title to be created is "Foo organisations" (with an S), a {{category redirect}} is created to "Foo organizations" (with a Z). And vice versa.
- Per a request by User:Hellknowz at the 2017 BRFA, the redirect template includes the parameter
|bot=BHGbot
- The module includes sanity checks to:
- skip any pages whose title does not match the regex
/^(.*?\b[oO]rgani)[sz](ations?\b.*)$/
- skip any case where it is about to create a self-redirect
- skip any pages whose title does not match the regex
- I have done a dry run (AWB in pre-parse mode) on a deliberately-polluted list of test pages, and it correctly skipped them all. I did another test of the full list of ~12,500 pages, where no pages were skipped, which indicates the accuracy of the list-making.
- Differences from BHGbot 3
- This proposal tackles the same problem as the 2017 proposal BHGbot 3, but it uses a different approach. The 2017 proposal drew its list from recursing the category tree. This proposal uses quarry to collect list of category titles. Using quarry gives a complete list, whereas category recursion is usually woefully incomplete. The quarry-generated lists allow rigorous checks against error.
Discussion
[edit]Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 22:03, 2 August 2020 (UTC)[reply]
Trial complete.. Thanks, @Primefac.
- I used the linux shuf command to randomly select 50 pages from a list of 12,461 categories which I had built last week while testing the list-making:
- Here are the 50 trial edits.
- No pages were skipped, and I have reviewed each of the 50 edits. The redirects are all as intended. --BrownHairedGirl (talk) • (contribs) 10:17, 4 August 2020 (UTC)[reply]
- Approved. Primefac (talk) 00:37, 6 August 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.