Wikipedia:Bots/Requests for approval/Zorglbot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.
I am thinking of adding a function to Zorglbot, completely unrelated to what it does now, but need some help. Special:Shortpages is a very useful pages for finding pages that are unlikely to be real articles, or obvious candidates for speedy deletion which went through RC patrollers. Unfortunately, the page also include very short but legitimate pages, such as those blanked following a copyvio and containing only a {{copyvio}} template. Checking this articles manually takes much time, and is likely to be done by several editors. I'd like a bot to get the top entries in the list, parse each of the pages and indicate which are likely to be legitimate. An example of what I mean can be found at User:Schutz/Shortpages; please don't look too much at the awful colours, but basically, only pages on a white background are probably worth checking.
The requirements of the bot are as following: read the original page, retrieve all of the articles listed (except those already deleted), parse them locally and write the resulting table. So a few hundred pages read and one written.
The questions I have (in addition to a general approval, of course, and knowing if this is likely to be a performance problem) relate to the parameters of the page: how often should the list be updated (Special:Shortpages is cached; it has not been updated for 2 days, but I don't know how often it is updated) and how many articles should be queried. I spent quite a bit of time over the week-end going through the original lists (and tagging tens of article for speedy deletion), and noticed that most of the top 200 articles are already taken care of; getting the first 500 already provides work for a while.
Any comment or question welcome ! Schutz 20:14, 4 September 2006 (UTC)[reply]
- I forgot to mention that Special:Shortpages contains 1000 entries in total; ideally, the bot would request and parse all of them. Schutz 06:44, 5 September 2006 (UTC)[reply]
- Approved for testing, up to 100 edits, post results when done. — xaosflux Talk 02:44, 7 September 2006 (UTC)[reply]
- I need to clarify something here; the bot does only 1 edit at each run, but it must request up to 1000 pages before that. When you say "up to 100 edits", do you really imply "100 edits, no restriction on the number of pages read" ? Just to be sure... Schutz 21:49, 7 September 2006 (UTC)[reply]
Ok, I have done a number of manual runs, and the bot is now scheduled to run every morning at 6:00 UTC (starting tomorrow). It reads the 1000 pages, with a 2 second pause between the pages, and write one summary page afterwards. People seem to be happy with the results, I'll let it run like that for the rest of the testing phase. Schutz 07:17, 12 September 2006 (UTC)[reply]
- Task approved. Voice-of-All 23:15, 22 September 2006 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.
Zorglbot's current main function is to do housekeeping on Wikipedia:Copyright problems; I'd like to run a new script under the same username which would do similar tasks for Wikipedia:Templates for deletion: create a new subpage every day, and archive the discussions from the previous days. Should be pretty straightforward. Schutz 19:54, 5 September 2006 (UTC)[reply]
- Testing approved, 7-14 days, post results here. — xaosflux Talk 02:43, 7 September 2006 (UTC)[reply]
The bot has been running for the past few days (except this morning where the computer it runs on got turned off), and everything seems ok. I'll let it run like that for the next few days if there is no special comment. Schutz 07:18, 12 September 2006 (UTC)[reply]
- Approved; the bot shall run with a flag. Voice-of-All 23:15, 22 September 2006 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.