Wikipedia:Bots/Requests for approval/FrescoBot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Basilicofresco
Automatic or Manually assisted: Auto (where not stated differently)
Programming language(s): python (pywikipedia)
Source code available: standard pywikipedia
Function overview: remove useless piping within wikilinks links syntax
Links to relevant discussions (where appropriate): Wikipedia talk:Piped link#Changing existing links to piped links with capital first letter?
Edit period(s): every few months (or less) using the xml dump file
Estimated number of pages affected: 20k (rough guess)
Exclusion compliant (Y/N): Y
Already has a bot flag (Y/N): Y
Discussion regarding original proposal |
---|
The following discussion has been closed. Please do not modify it. |
Function details: as stated in Wikipedia:Piped link#When not to use we should never use piped links to convert first letter to lower case. This well-tested bot will correct occurences like [[Country code second-level domain|country code second-level domain]]. Already in use on italian Wikipedia. More examples:
Discussion[edit]Three things:
Tim1357 (talk) 15:15, 28 December 2009 (UTC)[reply]
TBH, I really don't like the idea of a bot making 16000 edits that will result in no visible change to the rendered page. Its like WP:NOTBROKEN, except with possibly fewer benefits. Mr.Z-man 22:43, 28 December 2009 (UTC)[reply]
There is nothing wrong with having another smackbot. However, I think we should ask Rich to give the bot's code to User:Basilicofresco, so we can cram as many general fixes into one edit as possible. Tim1357 (talk) 02:40, 30 December 2009 (UTC)[reply]
I like the idea of adding general fixes to other bots. --IP69.226.103.13 (talk) 16:59, 30 December 2009 (UTC)[reply]
Well, I can easily add other fixes, for example:
Basilicofresco (msg) 13:50, 3 January 2010 (UTC)[reply]
Yes, they are. The above examples are real. Do you prefer to carefully eye-check 3 million of pages and manually edit about 4k pages? (1:820, guess based on random articles sampling) -- Basilicofresco (msg) 08:53, 4 January 2010 (UTC)[reply]
FrescoBot 2 bis[edit]Never mind, if the majority considers "cosmetic" my proposal about useless pipings, I can remove it. I'm here to help you, not to raise my editcount. So, what about the second group of replacements? -- Basilicofresco (msg) 20:59, 5 January 2010 (UTC)[reply]
Fixing also external links[edit]I'm testing on italian wikipedia a new set of regex for syntax errors in external links. It probably would be nice to add them here in order to create a single task. I will add details here as soon as possible. -- Basilicofresco (msg) 09:02, 9 January 2010 (UTC)[reply] |
Function details: (new proposal) Using replace.py I will apply several accurate regular expressions in order to correct these errors:
Example: wrong wikisource --> replaced wikisource = error as appears in the article --> replaced text as appears in the article
- External links
- [HTTP://www.google.it link] --> [http://www.google.it link] = link --> link
- [http://http://www.google.it link] --> [http://www.google.it link] = link --> link
- [http:www.google.it link] --> [http://www.google.it link] = [http:www.google.it link] --> link
- [http:/www.google.it link] --> [http://www.google.it link] = [http:/www.google.it link] --> link
- [http:///www.google.it link] --> [http://www.google.it link] = link --> link
- [[http://www.google.it link]] --> [http://www.google.it link] = [link] --> link
- [[http://www.google.it link] --> [http://www.google.it link] = [link --> link
- [http:://www.google.it link] --> [http://www.google.it link] = [http:://www.google.it link] --> link
- [http//www.google.it link] --> [http://www.google.it link] = [http//www.google.it link] --> link
- something[http://www.google.it link] --> something [http://www.google.it link] = somethinglink --> something link
- [http://www.google.it link]something --> [http://www.google.it link] something = linksomething--> link something
- few other very rare variants - (manually assisted)
- [http://images.google.com/imgres?imgurl=http://habitant.org/images/stignatius.jpg&imgrefurl=http://habitant.org/houghton/fcgenealogy.htm&h=287&w=320&sz=29&hl=en&start=33&um=1&tbnid=ibPawlbIEskUcM:&tbnh=106&tbnw=118&prev=/images%3Fq%3DHoughton%2BMI%26start%3D20%26ndsp%3D20%26svnum%3D10%26um%3D1%26hl%3Den%26sa%3DN Flat Broke Blues Band Photo Album] --> [http://www.flatbrokebluesband.com/photos.php Flat Broke Blues Band Photo Album] = Flat Broke Blues Band Photo Album --> Flat Broke Blues Band Photo Album
- [http://images.google.com/imgres?imgurl=http://habitant.org/images/stignatius.jpg&imgrefurl=http://habitant.org/houghton/fcgenealogy.htm&h=287&w=320&sz=29&hl=en&start=33&um=1&tbnid=ibPawlbIEskUcM:&tbnh=106&tbnw=118&prev=/images%3Fq%3DHoughton%2BMI%26start%3D20%26ndsp%3D20%26svnum%3D10%26um%3D1%26hl%3Den%26sa%3DN Google Image Result for http://www.flatbrokebluesband.com/photos.php<!-- Bot generated title -->] --> [http://www.flatbrokebluesband.com/photos.php] = Google Image Result for http://www.flatbrokebluesband.com/photos.php --> [1]
- Wikilinks
- [[Sonar||sidescan sonar]] --> [[Sonar|sidescan sonar]] = |sidescan sonar --> sidescan sonar
- [['''''sonar''''']] --> '''''[[sonar]]''''' = '''''sonar''''' --> sonar
- [['''sonar''']] --> '''[[sonar]]''' = '''sonar''' --> sonar
- [[''sonar'']] --> ''[[sonar]]'' = ''sonar'' --> sonar
- [["sonar"]] --> "[[sonar]]" = "sonar" --> "sonar" - I will avoid the few (24) exceptions, eg. "Them")
- [[(sonar)]] --> ([[sonar]]) = (sonar) --> (sonar) - I will avoid the few (28) exceptions, eg. (not adam)
- [['sonar']] --> '[[sonar]]' = 'sonar' --> 'sonar' - I will avoid the few (1) exceptions, eg. 'Hours'
- [[sonar,]] --> [[sonar]], = sonar, --> sonar, - I will avoid the few (1) exceptions, eg. Alors voilà,
- something[[sonar]] --> something [[sonar]] = somethingsonar --> something sonar
- something[[ sonar]] --> something [[sonar]] = somethingsonar --> something sonar
- [[1992-1998]] --> [[1992]]-[[1998]] = 1992-1998 --> 1992-1998 - any type of dash, I will avoid the few (26) exceptions, eg. 1967–1970, I will also avoid any decade eg. 1950-1959 (I just created these redirects to decades in order to capture any red-but-plausible wikilink)
- [[1992-98]] --> [[1992]]-[[1998|98]] = 1992-98 --> 1992-98 - any type of dash, I will avoid the few (2) exceptions, eg. 1806-20, I will avoid potentially ambiguous intervals (cross-century) eg. 1862-34
- [[Nile Delta ]]and --> [[Nile Delta]] and = Nile Delta and --> Nile Delta and - (manually assisted)
- [[sonar.]] --> [[sonar]]. = sonar. --> sonar. - (manually assisted)
- few other very rare variants - (manually assisted)
- Internal links conversion
- [http://en.wikipedia.org/wiki/ECFS_%28cable_system%29 ECFS] --> [[ECFS (cable system)|ECFS]] = ECFS --> ECFS (handles piping and common url encoding)
- [http://en.wikipedia.org/wiki/File:Flag_of_Brunei.svg Flag of Brunei] --> [[:File:Flag of Brunei.svg|Flag of Brunei]] = Flag of Brunei --> Flag of Brunei (properly handles files and categories)
- [http://fr.wikipedia.org/wiki/Ren%C3%A9-Maurice_Gattefoss%C3%A9 René-Maurice Gattefossé] --> [[:fr:René-Maurice Gattefossé|René-Maurice Gattefossé]] = René-Maurice Gattefossé --> René-Maurice Gattefossé (handles links to wikipedia in foreign languages)
- [http://en.wikipedia.org/wiki/Louise_Marie_Ad%C3%A9la%C3%AFde_de_Bourbon-Penthi%C3%A8vre Mme de Genliss] --> [[Louise Marie Adélaïde de Bourbon-Penthièvre|Mme de Genliss]] = Mme de Genliss --> Mme de Genliss (converts a good number of unicode sequences)
- [http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%87%E0%A4%82%E0%A4%9F%E0%A4%B0%E0%A4%A8%E0%A5%87%E0%A4%9F_%E0%A4%AA%E0%A4%B0_%E0%A4%B9%E0%A4%BF%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%80_%E0%A4%95%E0%A5%87_%E0%A4%B8%E0%A4%BE%E0%A4%A7%E0%A4%A8 Tools and Techniques for Hindi Computing] --> [[:hi:%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%87%E0%A4%82%E0%A4%9F%E0%A4%B0%E0%A4%A8%E0%A5%87%E0%A4%9F %E0%A4%AA%E0%A4%B0 %E0%A4%B9%E0%A4%BF%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%80 %E0%A4%95%E0%A5%87 %E0%A4%B8%E0%A4%BE%E0%A4%A7%E0%A4%A8|Tools and Techniques for Hindi Computing]] = Tools and Techniques for Hindi Computing --> Tools and Techniques for Hindi Computing (does not screw up with exotic not-recognized unicode sequences)
Discussion about new proposal
[edit]What will you do in the event of people creating articles that you don't currently have in your exceptions? Ale_Jrbtalk 19:41, 16 January 2010 (UTC)[reply]
- Excluding years intervals, there are currently only 60 exceptions over 3162000 articles. This means 60/3162000 = 1/52700. The probability of replacing a red wikilink (article missing) with a "incorrect" red wikilink (missing quotes/brakets/etc.) is about 1/52700. Pretty low. Moreover any new article with such a peculiar name will likely came with a redirect from the cleaned name. The risk is negligible and acceptable considering this task for example will fix +3k broken wikilinks. However what I can do is to periodically update the exception list, make my best to avoid any error and promptly correct any problem. -- Basilicofresco (msg) 13:15, 17 January 2010 (UTC)[reply]
- Periodic updates to the list sounds like a reasonable work-around. How often do you think you'll update the list of exception cases? Josh Parris 04:31, 19 January 2010 (UTC)[reply]
- I plan to run the script every time is available a new dump file and I'm going to check for new exclusions before every run. It is the safest method. -- Basilicofresco (msg) 08:41, 19 January 2010 (UTC)[reply]
Perhaps a heuristic you could use is that if the link is a redirect, you can repair; if it's an article, you can't repair. How does this fit with the exceptions you have identified? Josh Parris 12:53, 18 January 2010 (UTC)[reply]
- Of course, I created the exclusion list starting from the existing articles with a matched name. -- Basilicofresco (msg) 19:51, 18 January 2010 (UTC)[reply]
Meanwhile I also tested the not trivial conversion of "internal links" in wikilinks (take a look above). As you asked after my first proposal I put toghether a good bunch of several tasks. Let me know if there are any other common link problems I can solve of if you would like I reintroduce also the cleaning from useless piping. -- Basilicofresco (msg) 09:35, 21 January 2010 (UTC)[reply]
I would like to know which ones are currently fixed by WP:AWB and/or are part of WP:CHECKWIKI i.e. they are fixed in daily basis from many editors -- Magioladitis (talk) 09:28, 24 January 2010 (UTC)[reply]
- I don't know. Probably few fixes are included or partially-included, but it is far from being a problem. Can I start with some test edits so we can see if AWB editors are really able to correct any error on every page on daily basis? -- Basilicofresco (msg) 12:05, 24 January 2010 (UTC)[reply]
Info: This is what AWB can do atm. -- Magioladitis (talk) 15:23, 9 February 2010 (UTC) ...and some more. Basilicofresco, very good ideas! PS Better poke someone from BAG to get approved. -- Magioladitis (talk) 19:46, 9 February 2010 (UTC)[reply]
Update: I'm performing additional tests in order to further improve the above collection. I will soon ping a BAG operator. -- Basilicofresco (msg) 09:41, 12 February 2010 (UTC)[reply]
Useless piping
[edit]I checked the 2009/11/28 dump and I found out that just about 7% of useless pipings (existing at that date) have been corrected since its creation (2 months and 15 days ago). It means that AWB "daily basis" fixing is simply not enough. Many of you criticized the first proposal (usless piping removal only) because "cosmetic only". Ok, but now there is a whole collection of fixes and adding also a useless piping removal imho seems appropriate and balanced. See also Wikipedia:Piped link#When not to use. Is there any objection? -- Basilicofresco (msg) 16:07, 12 February 2010 (UTC)[reply]
- Useless piping (improved)
- [[Sidescan sonar|Sidescan sonar]] --> [[Sidescan sonar]] = Sidescan sonar --> Sidescan sonar
- [[sidescan sonar|Sidescan sonar]] --> [[Sidescan sonar]] = Sidescan sonar --> Sidescan sonar
- [[Sidescan sonar|sidescan sonar]] --> [[sidescan sonar]] = sidescan sonar --> sidescan sonar
- [[Breakfast_of_Champions|Breakfast of Champions]] --> [[Breakfast of Champions]] = Breakfast of Champions --> Breakfast of Champions
- [[Breakfast of Champions|"Breakfast of Champions"]] --> "[[Breakfast of Champions]]" = "Breakfast of Champions" --> "Breakfast of Champions" - also with ( ) and '' ''
- [[Breakfast_of_Champions|"Breakfast of Champions"]] --> "[[Breakfast of Champions]]" = "Breakfast of Champions" --> "Breakfast of Champions" - also with ( ) and '' ''
- other minor variants
Basilicofresco (msg) 16:07, 12 February 2010 (UTC)[reply]
- I support these fixes if the number is so big. -- Magioladitis (talk) 16:13, 12 February 2010 (UTC)[reply]
- Info: that's what AWB can do v.5.0.1.0 (rev. 6203). -- Magioladitis (talk) 10:51, 13 February 2010 (UTC)[reply]
Basilicofresco, where will your list of exceptions be located? Will be updated manually or automatically? Can other editors update it? -- Magioladitis (talk) 10:53, 13 February 2010 (UTC)[reply]
- Exceptions were located on my userspace on it.wikipedia, but they are now also present here. I'm going to systematically check for new exclusions among page names on the fresh dump before every run (1 per month or less). Obiouvsly suggestions are always welcome. -- Basilicofresco (msg) 18:47, 13 February 2010 (UTC)[reply]
- I think the majority of your fixes are good (i.e., the ones that change the way the page appears or the ones that change the final target of the link), and I would be interested in approving this bot for a trial. However, I'm not so sure if certain fixes you mentioned above are necessary, such as the ones that only change wikicode, and do not affect the final rendered page. A bot making an edit solely for cosmetic purposes, and not to fix links that are actually broken, may be a little wasteful (e.g., Sidescan sonar → Sidescan sonar). I think it would be best to stick with external links/wikilinks/internal link conversions, and avoid fixing useless piping for now. — The Earwig @ 01:43, 18 February 2010 (UTC)[reply]
- It's part of CHECKWIKI anyway. Some people solely do that. Why not a bot to save us time and effort? Of course it depends of the amount of edits done per day. We need some estimate but I don't think there were be many anyway. -- Magioladitis (talk) 07:29, 18 February 2010 (UTC)[reply]
- Useless piping is considered a middle priority issue by checkwiki project. And, as you can see, they pointed out the problem should be corrected by "AWB, AutoEd, BOT". I checked againg the november dump with a larger sampling (900 old mistakes in the middle of the dump) and the test shows that only about 150 (17%) were corrected during the past 3 months. IMO general fixes of AWB and AutoEd are in need of help. For this reason I asked again for your opinion. Other comments are welcome. -- Basilicofresco (msg) 08:10, 19 February 2010 (UTC)[reply]
- Fair enough. Approved for trial (50 edits) Please provide a link to the relevant contributions and/or diffs when the trial is complete., with all fixes enabled. — The Earwig @ 16:11, 19 February 2010 (UTC)[reply]
- Done. The most common corrections are missing spaces near links, useless piping and wiki/interwikification of "fake" external links. -- Basilicofresco (msg) 18:52, 20 February 2010 (UTC)[reply]
- Trial complete. Results look good. Any comments/objections before this task is approved? — The Earwig @ 22:23, 20 February 2010 (UTC)[reply]
- Is the code soemwhere published? --Magioladitis (talk) 23:28, 20 February 2010 (UTC)[reply]
- No, but if everything will go fine, I will probably publish it in a near future. -- Basilicofresco (msg) 01:26, 21 February 2010 (UTC)[reply]
- Is the code soemwhere published? --Magioladitis (talk) 23:28, 20 February 2010 (UTC)[reply]
- Trial complete. Results look good. Any comments/objections before this task is approved? — The Earwig @ 22:23, 20 February 2010 (UTC)[reply]
- Ps. I'm going to include also namespace 6 because it's useful and harmless (tested). Is it ok? -- Basilicofresco (msg) 06:52, 22 February 2010 (UTC)[reply]
If there are no objections, I'm ready to start. -- Basilicofresco (msg) 15:01, 24 February 2010 (UTC)[reply]
Approved. after reviewing the results, which provide a good example of the breadth of functionality to be exercised. Josh Parris 10:21, 25 February 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.