Wikipedia:Bots/Requests for approval/WikiCleanerBot 17
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 14:57, Monday, May 25, 2020 (UTC)
Function overview: Do edit for fixing Special:LintErrors/wikilink-in-extlink / CW Error #513 (Links in links).
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (WPCleaner)
Source code available: On GitHub (especially algorithm 513)
Links to relevant discussions (where appropriate):
Edit period(s): Twice a month
Estimated number of pages affected: Special:LintErrors/wikilink-in-extlink currently reports about 60k errors (for all namespaces), and the bot will only fix some situations, so I expect the number of pages affected ranging from a few thousands to 20k. I will also generate a dump analysis in Wikipedia:CHECKWIKI/WPC 513 dump for a better view of the problems (it will display the problematic links).
Namespace(s): Main
Exclusion compliant (Yes/No): Yes
Function details: The bot will fix some of the problems due to internal links inside external links (like [https://... text [[link]] text]
) which result in poor display. It will only be able to fix part of the errors. The behavior of the fixes can be customized per wiki (see configuration of error 513).
The fixes and the configuration will be done progressively : running the bot on Special:LintErrors/wikilink-in-extlink or on Wikipedia:CHECKWIKI/WPC 513 dump, check what is fixed, extend the configuration/improve the algorithm if needed, update Wikipedia:CHECKWIKI/WPC 513 dump if needed, and starting again...
I already run a similar task on frwiki with a few thousand edits (in several runs, allowing to improve the range of detection and automatic fixing).
Examples of automatic fixes that show what the algorithm do with different situations:
- 1956 Eilat bus ambush:
[https://... Four Killed In Ambush, [[Vancouver Sun]]]
is replaced by[https://... Four Killed In Ambush], [[Vancouver Sun]]
(the coma before the internal link makes the shortening of the external link safe enough and automatic) - 1975 State of the Union Address:
[https://... (full video and audio), ''Miller Center of Public Affairs'', [[University of Virginia]].]
is replaced by[https://... (full video and audio), ''Miller Center of Public Affairs''], [[University of Virginia]].
(same as previous, and the dot after is also accepted as a punctuation) - 1981 Vienna synagogue attack:
[https://... Palestinians get life in Austrian Slayings, ''[[The New York Times]]'', January 22, 1982]
is replaced by[https://..l Palestinians get life in Austrian Slayings], ''[[The New York Times]]'', January 22, 1982
(same as previous, and, January 22, 1982
is accepted as matching a configured regular expression) - 2012 Dhivehi League Round 2:
[http://... Report (by [[Football Association of Maldives|FAM]])]
is replaced by[http://... Report] (by [[Football Association of Maldives|FAM]])
(same as previous but with the opening parenthesis, andby
is accepted as a configured text)
If interested in details, currently, the algorithm is as follow, but it may evolve if I find enhancements along the way:
- Analysis of external links created directly in wikitext (like
[https://... ]
) :- It looks for the first instance of :
- an internal link (like
[[...]]
) - a template creating an internal link (like
{{ISBN|...}}
, the list of templates WPCleaner looks for is configured with variableerror_513_templates_enwiki
- an internal link (like
- If it's a template, and a replacement template has been configured for this template (on frwiki for example: {{date}} can be replaced by {{date-}}, the first creates link to dates, the latter no) :
- The only suggestion is to replace the template
- The replacement is automatic only if it has been configured to be automatic.
- If it's an internal link or a template without replacement
- The bot will go backward from the beginning of the link/template to see where the external link could be shortened: it takes into account whitespaces, some punctuations (
,-–:(
currently) or some configured texts (in variableerror_513_texts_before_enwiki
). If a punctuation or a configured text with automatic flag set is found, the position to shorten the external link is deemed safe enough. - The bot will go forward from the end of the link/template to see if it can go safely to the end of the external link : it takes into account whitespaces, some punctuations (
,-–:)
currently) or some configured regular expressions (in variableerror_513_texts_after_enwiki
). - If the position to shorten the external link is deemed safe enough and the bot could go to the end of the external link, the external link is shortened.
- If it's an internal link at the beginning of the external link, and the link is configured (in variable
error_513_links_first_enwiki
), the internal link is moved before the external link
- The bot will go backward from the beginning of the link/template to see where the external link could be shortened: it takes into account whitespaces, some punctuations (
- It looks for the first instance of :
- Analysis of external links created through the use of templates (like {{Cite web|url=...|title=...}} using its url and title parameters to create an external link). The list of template/parameter is configured in variable
error_513_template_params_enwiki
- It looks for the first instance of an internal link or a template creating an internal link (same as above)
- If it's a template, and a replacement template has been configured... (same as above)
- If it's an internal link and the template/parameter is configured for automatic removal of the links, the internal link is replaced by the displayed text.
Discussion
[edit]What namespaces will this bot operate in? The bot should not fix deliberate errors, which means that operating in Template, Help, and Talk spaces is probably not advisable. I support its use in article space and Draft space. I have fixed a few thousand of these errors, which can be tricky to figure out, and I look forward to seeing some test edits to see how well the algorithm works. – Jonesey95 (talk) 15:38, 25 May 2020 (UTC)[reply]
- Hi Jonesey95. For the moment, only Main namespace. Maybe other namespaces in the future, but I will open a new Request for approval then. I agree that Template and Talk are too tricky, Help I don't know, but I would rather go for namespaces like Category, File, Reference... before.
- If you want to see some results, I've already done several thousands modifications on frwiki : here, here, here... (look for "Lien interne dans un lien externe", with "2.02b", the "b" is for bot). --NicoV (Talk on frwiki) 16:48, 25 May 2020 (UTC)[reply]
- I clicked on many of those corrections, but they are all wikilinks in
|titre=
parameters of citation templates. We do not have any of those. Those errors would appear in Category:CS1 errors: URL–wikilink conflict (2), which is currently empty (I fixed many thousands of articles a few years ago, and a couple of diligent editors watch the category for new errors). Do you have fixes for Linter errors in regular URL links? If not, I can wait for the bot trial. Merci. – Jonesey95 (talk) 18:11, 25 May 2020 (UTC)[reply]- Hi Jonesey95. I proceeded step by step on frwiki, so each list may have rather one type of modification. I think this list maybe closer to what you're looking for (older list with actual internal links). But I think, I'll find ideas for improvements when I have started working really on enwiki for this. For example, among the improvements, I think of adding a list of internal links that can be safely put before the external link when they are at the beginning (like in 1953 Milwaukee Braves season for
[http://... [[Retrosheet]] box score: 1953-04-13]
replaced by[[Retrosheet]] [http://... box score: 1953-04-13]
). --NicoV (Talk on frwiki) 18:45, 25 May 2020 (UTC)[reply] - And in fact, there are maybe templates like {{URL}} with wikilinks in
|2=
, for example in Åbyhøj Church. --NicoV (Talk on frwiki) 18:58, 25 May 2020 (UTC)[reply]- Hi Jonesey95. I've implemented the improvement mentioned just above, most of the modifications in this list are for the same internal link (to Élections Nouveau-Brunswick) at the beginning of the external link. --NicoV (Talk on frwiki) 15:35, 29 May 2020 (UTC)[reply]
- Hi Jonesey95. I proceeded step by step on frwiki, so each list may have rather one type of modification. I think this list maybe closer to what you're looking for (older list with actual internal links). But I think, I'll find ideas for improvements when I have started working really on enwiki for this. For example, among the improvements, I think of adding a list of internal links that can be safely put before the external link when they are at the beginning (like in 1953 Milwaukee Braves season for
- I clicked on many of those corrections, but they are all wikilinks in
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Mainspace only. Primefac (talk) 14:55, 29 May 2020 (UTC)[reply]
- Trial complete. Thanks Primefac. I've done 50 edits, and I didn't see big problems, just 2 very minor tweaks. For this edit, I've added " " to the texts before, so in similar cases, the closing bracket will be before it. For this edit, I've modified the detection of the texts before to be case insensitive. Jonesey95, if you're interested to check the edits. --NicoV (Talk on frwiki) 16:29, 29 May 2020 (UTC)[reply]
- Edited after bot approval: I also checked the edits, and they look great! Thanks for taking on this task, NicoV. Ping me if you need help. – Jonesey95 (talk) 00:09, 31 May 2020 (UTC)[reply]
- Trial complete. Thanks Primefac. I've done 50 edits, and I didn't see big problems, just 2 very minor tweaks. For this edit, I've added " " to the texts before, so in similar cases, the closing bracket will be before it. For this edit, I've modified the detection of the texts before to be case insensitive. Jonesey95, if you're interested to check the edits. --NicoV (Talk on frwiki) 16:29, 29 May 2020 (UTC)[reply]
Approved. I looked over the edits and this performs as expected. As per usual, if amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. --TheSandDoctor Talk 18:42, 30 May 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.