Wikipedia:Bots/Requests for approval/MacMedBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Withdrawn by operator.
Operator: MacMed
Automatic or Manually assisted: Automatic
Programming language(s): Python
Source code available: Yes
Function overview: MacMedBot finds pages that have been tagged with PROD in the past, but don't have the {{oldprodfull}} tag on their talk page.
Edit period(s): Continuous to get the majority of what is missing now, then probably once a week or so.
Estimated number of pages affected: Any page missing {{oldprodfull}}, could be hundreds or thousands, I am not really sure.
Exclusion compliant (Y/N): N.
Already has a bot flag (Y/N): N.
Function details: MacMedBot searches pages per category (including subcats). It looks through past revisions for "{{datedprod", and if it finds one, it toggles to the talk page. It then checks the talk page for an existing {{oldprodfull}} tag, and adds one if it cannot be found.
Discussion
[edit]Essentially the idea here is that you would specify a (preferably large) category for the bot to work on when starting the script, such as Category:Living people, Category:Medicine, or Category:Arthropods, and the bot would process every page in that category and its subcategories, then shut off when done. I hope this clears this task up a little, which was probably rather ambiguous at first. The Earwig (Talk | Contribs) 19:18, 6 September 2009 (UTC)[reply]
- I had a short look trough the code, although I don't know Python. But looking at it, it seems the bot will just add {{oldprodfull}}, without any parameters. Would it bee possible to add parameters? Such as what the concern was, what date it was PRODed, who contested, what date it was contested, etc. Also, will the bot check for Template:OldPRODfull? - Kingpin13 (talk) 11:29, 7 September 2009 (UTC)[reply]
- As it stands, I think this is far too wasteful for the minimal benefit gained. Some performance issues I see:
- If you're going to use categories, you should make a list somewhere of what pages its already done, else you're going to end up checking a lot of pages multiple times.
- It doesn't check for an existing {{oldprodfull}} until after it gets the content of 50 revisions. You could check for oldprodfull by just getting the text of 1 revision, or better yet, just the list of templates on the talk page
- It gets the last 50 revisions unconditionally. If you absolutely must check the revision text, you should get them one at a time. A) you can better throttle your requests, B) You won't download 49 unnecessary revisions if the page is currently prod'ed.
- Some general code review comments:
- The way it searches for prod templates seems rather odd. You cut off the current revision from the revision list, then return the last revision as "check" and if "check" contains the dated prod template, you assume the prod is still active. What if it was the current revision was the one that removed it?
talkPage = page.toggleTalkPage() # Get the current page's talk page.
talkPage = unicode(talkPage)
talkPage = re.findall("\[\[(.*?)\]\]", talkPage)
talkPage = talkPage[0]
- What on earth is this? If you're only dealing with articles, getting the talk page title should be as simple as prepending "Talk:" to the page title.
f.write("\n* [[%s]] at %s." % (logTitle, datetime.now())
- For one, this doesn't work, it should be
datetime.datetime.now()
(datetime is a class inside the datetime module), second, this is going to give you dates like "2009-09-07 21:37:05.127097".
- For one, this doesn't work, it should be
except wikipedia.IsRedirectPage: # If the target page is a redirect, follow it.
return # Don't process.
- One of those comments is wrong...
- It only seems to log successful actions - that's generally the one thing you don't need to log, as MediaWiki does that for you. If you need to log something, it should probably log errors.
- -- Mr.Z-man 21:59, 7 September 2009 (UTC)[reply]
- Done. I have added the nomreason parameter, and fixed the problems pointed out by Mr. Z-man. The code should run without a problem now. MacMedtalkstalk 00:52, 12 September 2009 (UTC)[reply]
Is the bot still planning on looking through every revision of every page to find {{dated prod}}? I agree that is a massive waste of resources (and would also take ages to get anywhere). --ThaddeusB (talk) 20:50, 12 September 2009 (UTC)[reply]
- No, it only searches the last 50 revisions of the page. MacMedtalkstalk 21:28, 12 September 2009 (UTC)[reply]
- OK, but surely there is a better to accomplish this check than loading the talk page and then (if necessary) the last 50 revisions for all 3 million articles? --ThaddeusB (talk) 02:13, 14 September 2009 (UTC)[reply]
- The point of loading the talk page is so that we don't waste time looking through a page, then discovering that it has already been tagged with {{oldprodfull}}. And the search through the revisions goes one by one through each revision (via the API), so that it stops as soon as it finds the PROD. This is as optimized as it can get, and the point is that a bot will take that time to look through the revision history. Yes, it could take some time, but without this bot, a human would take much, much longer. MacMedtalkstalk 03:59, 14 September 2009 (UTC)[reply]
- I fully understand the point of checking the talk page first. I think, perhaps, you are missing my point. 3000000 x 40 (estimate - not 50 since some pages have less than 50 total revisions) is an awful lot of queries to find the <1% of articles that have been previously PRODed. At one query a second (which is the absolute maximum it should be pulling) that would take 1400+ days to go through them all. Anyway, its not up to me to decide if it is worthwhile - that is BAG's job. --ThaddeusB (talk) 21:06, 17 September 2009 (UTC)[reply]
- The point of loading the talk page is so that we don't waste time looking through a page, then discovering that it has already been tagged with {{oldprodfull}}. And the search through the revisions goes one by one through each revision (via the API), so that it stops as soon as it finds the PROD. This is as optimized as it can get, and the point is that a bot will take that time to look through the revision history. Yes, it could take some time, but without this bot, a human would take much, much longer. MacMedtalkstalk 03:59, 14 September 2009 (UTC)[reply]
- OK, but surely there is a better to accomplish this check than loading the talk page and then (if necessary) the last 50 revisions for all 3 million articles? --ThaddeusB (talk) 02:13, 14 September 2009 (UTC)[reply]
{{BAGAssistanceNeeded}} Perhaps I could run a trial on a small cat? MacMedtalkstalk 20:55, 15 September 2009 (UTC)[reply]
- So are you now checking the revisions one at a time? Or still loading all 50 in one go. Also, there is a bot which currently does something similar to this, but it patrols the PROD category, and the op says he may switch it off should this bot become active (see Wikipedia_talk:BRFA#Restarting_up_an_old_task). - Kingpin13 (talk) 08:39, 16 September 2009 (UTC)[reply]
- Yes. The bot searches the talk page for {{oldprodfull}} first, then processes the page if oldprodfull is not found. After that it goes through revisions one at a time, stopping once it finds one. It will also place the concern of the prod in the |nomreason= parameter of oldprodfull. Regards, MacMedtalkstalk 13:01, 16 September 2009 (UTC)[reply]
- Approved for trial (Category:English potters). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Okay, this cat should have at least one - Kingpin13 (talk) 14:52, 16 September 2009 (UTC)[reply]
- Yes. The bot searches the talk page for {{oldprodfull}} first, then processes the page if oldprodfull is not found. After that it goes through revisions one at a time, stopping once it finds one. It will also place the concern of the prod in the |nomreason= parameter of oldprodfull. Regards, MacMedtalkstalk 13:01, 16 September 2009 (UTC)[reply]
(outdent) Trial complete.
- There were a few bugs in the code, so if you'd like I can test on another small cat before getting started for real. (Note:The bugs were caused by the software update.) MacMedtalkstalk 21:10, 17 September 2009 (UTC)[reply]
- Couple of suggestions:
- {{oldprodfull}} offers a number of additional parameters. You shouldn't have any problem getting at least the nom date (it's right in the dated prod code) and the person who removed it isn't all that difficult to pull either - it is simply the person who made the revision immediately after the last one with the template (i.e. the revision the bot pulled before the matching one.)
- If the talk page has "{{oldafd" I'd skip the page. Yes, it may also have been prodded, but AfD always overrides prod (an article sent to AfD that survives can never be prodded again) so there is no need to have both templates on the talk page.
- --ThaddeusB (talk) 21:20, 17 September 2009 (UTC)[reply]
- What's the status of this? I can find another category for the bot to do a trial on, but I'd like to at least see ThaddeusB's first suggestion implented first, it shouldn't be too hard to identify more info. - Kingpin13 (talk) 04:43, 30 September 2009 (UTC)[reply]
- Withdrawn by operator. for now. I have a lot of stuff going on in RL, maybe I'll come back to this later. MacMedtalkstalk 01:59, 2 October 2009 (UTC)[reply]
- What's the status of this? I can find another category for the bot to do a trial on, but I'd like to at least see ThaddeusB's first suggestion implented first, it shouldn't be too hard to identify more info. - Kingpin13 (talk) 04:43, 30 September 2009 (UTC)[reply]
- --ThaddeusB (talk) 21:20, 17 September 2009 (UTC)[reply]
'kay, I've marked it as such for now, feel free to reopen anytime (let me know if you need help reopening the requests). Best, - Kingpin13 (talk) 07:58, 2 October 2009 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.