Wikipedia:Bots/Requests for approval/LemmeyBOT
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Operator: Lemmey
Automatic or Manually Assisted: Automatic
Programming Language(s): Python
Function Summary: Restores missing reference names
Edit period(s) (e.g. Continuous, daily, one time run): As needed.
Already has a bot flag (Y/N):
Function Details: Bot processes articles in the Category:Pages with incorrect ref formatting. Bot looks for missing reference names and the looks at the article history to restore those names.
Discussion
[edit]Looking at the contribs this BOT has already fixed 4 articles. The best example is Overpopulation ( diff) where the bot fixed 2 broken references dating back several edits. Category currently contains 1075 articles Excluding Wikipedia, Talk, and User pages. --Lemmey talk 05:30, 3 May 2008 (UTC)[reply]
That's pretty nifty. What do others think? — Werdna talk 05:18, 3 May 2008 (UTC)[reply]
- Sounds good to me. —paranomia (formerly tim.bounceback)a door? 20:28, 3 May 2008 (UTC)[reply]
BlackListed Links
[edit]This seems useful. What happens if a ref used a now-blacklisted link? Gimmetrow 07:35, 3 May 2008 (UTC)[reply]
- If the ref used a blacklisted link the proper correction the editor should have taken would be to remove all mentions of the reference not just the first named reference. Should the bot find a blacklisted link it will restore it like any other reference. --Lemmey talk 07:40, 3 May 2008 (UTC)[reply]
- Yes, that's what an editor *should* do, but an editor might legitimately remove a named ref and not catch all uses of the name. If the bot restores them, an editor seeing this is likely to rollback the bot edit. The bot should have some way to break the cycle. Antivandalbots only do one revert - perhaps this bot could only add refs to an article once in an hour. Gimmetrow 08:06, 3 May 2008 (UTC)[reply]
- Seems like the best way to end a cycle is to have better anti-vandal bots. If they are breaking named references they obviously know that a named reference exists, the av bot should just look for the short version ('< ?ref ?name ?= ?[/w-"] ?/ >') I can throttle the bot for a trial period and look at creating an anti-vandal bot. --Lemmey talk 14:12, 3 May 2008 (UTC)[reply]
- How are anti-vandal bots related at all to ending the cycle, except that a common technique used by anti-vandal bots is to only revert once? -- Cobi(t|c|b) 13:12, 5 May 2008 (UTC)[reply]
- Gimmetrow seemed worried that the two bots might get into an editwar. As shown below this is impossible for blacklisted links as per current MediaWiki protection controls. --Lemmey talk 18:15, 5 May 2008 (UTC)[reply]
- How are anti-vandal bots related at all to ending the cycle, except that a common technique used by anti-vandal bots is to only revert once? -- Cobi(t|c|b) 13:12, 5 May 2008 (UTC)[reply]
- Seems like the best way to end a cycle is to have better anti-vandal bots. If they are breaking named references they obviously know that a named reference exists, the av bot should just look for the short version ('< ?ref ?name ?= ?[/w-"] ?/ >') I can throttle the bot for a trial period and look at creating an anti-vandal bot. --Lemmey talk 14:12, 3 May 2008 (UTC)[reply]
- Yes, that's what an editor *should* do, but an editor might legitimately remove a named ref and not catch all uses of the name. If the bot restores them, an editor seeing this is likely to rollback the bot edit. The bot should have some way to break the cycle. Antivandalbots only do one revert - perhaps this bot could only add refs to an article once in an hour. Gimmetrow 08:06, 3 May 2008 (UTC)[reply]
Why not check if each link is blacklisted? Also, the talk link in your signature is annoying. — Werdna talk 04:12, 4 May 2008 (UTC)[reply]
- According to MaxSem and confirmed by testing it appears that it is not possible to save a blacklisted link when making an edit. It appears to be a non-issue. --Lemmey talk 08:11, 4 May 2008 (UTC)[reply]
- Yes, if the link is in the spam blacklist, the bot won't save. What happens? Will the bot crash, or keep trying to make the same edit? But I'm also asking about links simply removed because they are not reliable sources - a soft blacklist if you will. Gimmetrow 20:53, 4 May 2008 (UTC)[reply]
- The function throws an exception and then goes on to the next article. The bot is designed to attempt each article in "Category:Pages with incorrect ref formatting" once. I can create a list for use in future runs that will skip any articles attempted in the previous pass. This will prevent a rollback war between the bot and any editors / other bots.
- Yes, if the link is in the spam blacklist, the bot won't save. What happens? Will the bot crash, or keep trying to make the same edit? But I'm also asking about links simply removed because they are not reliable sources - a soft blacklist if you will. Gimmetrow 20:53, 4 May 2008 (UTC)[reply]
- As far as a particular named source being deemed unreliable, my view is that likely occurred due to a conversation on the talk page. As such the article would likely have enough eyes to already have all the instances of the named reference removed. (Example Source "BLOGGER" is deemed unreliable, it is unlikely a giant red broken ref warning with the name "BLOGGER" will not attract attention.) Since I'm only looking at ~1100 articles in the category, I expect this particular scenario to be minimal. --Lemmey talk 05:20, 5 May 2008 (UTC)[reply]
Once the trial is done, how often do you think you'll scan through the category? It will have a lot of articles at first, but eventually it will get down to just the handful that appear after the last scan by the bot. So once a day? once an hour? Related to that, I think it might be helpful to identify in the edit summary how long the named reference was missing, either by date or version number. If you're doing this like I would expect, that shouldn't be too hard. Restoring really old refs would be a flag to check the ref, I would think. Finally, if you try to edit an article and can't, it may be a blacklisted link, or it may be protected, or it may simply be an edit conflict. You would want to re-try edit conflicts after some delay. Gimmetrow 06:30, 5 May 2008 (UTC)[reply]
- I'd say no more than once a week. It really depends on how many are left and how fast the category turn over is (how fast it grows or shrinks). The bot isn't perfect. Right now it skips <ref name= "That have spaces in their names with or without the quotes"> as in United States housing market correction. I'll need to add that and be able to search really deep (500+ versions), something I currently have capped for processing time reasons. I'll look into the version number idea. --Lemmey talk 13:03, 5 May 2008 (UTC)[reply]
Notes
[edit]Note: Discussion about this bot is occuring at WP:AN. SQLQuery me! 08:41, 4 May 2008 (UTC)[reply]
- Noone seems to be discussing the bot there, only bot policy and the approval process as I noted on the BAG talk page. --Lemmey talk 09:06, 4 May 2008 (UTC)[reply]
- Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Strange problem solved. Gimmetrow 20:53, 4 May 2008 (UTC)[reply]
- {{BAGAssistanceNeeded}} Bot has completed trial.
- Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Strange problem solved. Gimmetrow 20:53, 4 May 2008 (UTC)[reply]
- Pattern was changed to include spaces (<ref name = Bot can now fix stuff like this/> . Bot now searches entire article history. Put mode changed to Put_Async. Get mode was changed to Get(Throttle=False). Bot runs faster now. --Lemmey talk 14:47, 9 May 2008 (UTC)[reply]
- You're welcome to keep running under the trial (this edits so slowly it's not as if its edits flood watchlists), but I would really like the age of the ref into the edit summary. Once you've cleared the backlog, I think you'll want to run this every day or two so the leftover named ref doesn't get removed. Gimmetrow 05:27, 10 May 2008 (UTC)[reply]
{{BAGAssistanceNeeded}} Bot has completed trial, BOT now has the ability to post the version # of the article the reference was restored from, see http://en.wikipedia.org/w/index.php?title=User:Lemmey&diff=prev&oldid=213721652.
Approved. Gimmetrow 07:37, 21 May 2008 (UTC)[reply]
- What happened here ? Gimmetrow 05:42, 10 May 2008 (UTC)[reply]
- It appears that the only existence of "alternate etymology" is a blank ref. --Lemmey talk 06:25, 10 May 2008 (UTC)[reply]
- I have resolved this issue. You can see the fix here ([1]). The bot will not put in any ref that is like <ref name = "Wat79"> *</ref>. It keeps looking in the history for a non-blank reference. --Lemmey talk 20:41, 13 May 2008 (UTC)[reply]
- There is a problem: your bot considers ref names case-insensitive, while it's not the case. MaxSem(Han shot first!) 17:54, 10 May 2008 (UTC)[reply]
- Issue is that the editor considered ref names to be case-insensitive substituting Columbia when he should have used columbia, it was a non-rendered ref and was fixed by the bot. Had it been looking for Columbia the Bot would have bottomed out and not fixed the ref. I'll state that having a named ref stated in full more than once is unsightly, unnecessary, and inefficient but I'll argue that it is not a more serious problem than a visible fault. I ran the bot on the article twice to fix all occurances. --Lemmey talk 18:35, 10 May 2008 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.