Wikipedia:Bots/Requests for approval/STBotT
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Request Expired.
Automatic or Manually Assisted: Automatic usually, manual list, list parsed with Notepad++, sometimes I will operate it
Programming Language(s): AWB
Function Summary: Crawl through 500 newpages at a time and add appropriate templates to articles needing wikification and categorization.
Edit period(s) (e.g. Continuous, daily, one time run): Probably 2-3 times a day, on 500 articles at a time, editing 200-300 of those, plus maybe 10 of those being interwikified.
Edit rate requested: 6 EPM
Already has a bot flag (Y/N): No :(
Function Details: Crawl through 500 newpages at a time and add appropriate templates to articles needing wikification and categorization. On these pages, bot will also attempt to clean up. If manually assisted, typo fix will be used. On any pages with interwiki links present or on those of names of personae, if I am assisting, I will run interwiki.py as STBotD, already approved as Wikipedia:Bots/Requests for approval/STBot 6.
Discussion
[edit]Manually assisted trial edits: 1 2 3 4 5 ST47Talk 20:58, 11 February 2007 (UTC)[reply]
You might want to mention this at Wikipedia:WIKI, CAT:NOCAT and Wikipedia talk:WSS to make sure they're on board with this, as prospective consumers. (Though in my experience from doing similar tasks in batch mode, the latter two seem pretty happy about/resigned to this in principle, on the basis of "rather sooner than later".) There's some resource implications to doing this "live" (extra reads, possibly premature taggings) I don't think they're too significant. Why 500, btw? That's not quite fast enough to completely keep up with the new pages, AFAIK. Alai 01:26, 12 February 2007 (UTC)[reply]
- Well, there's always Wikipedia:PERF regarding performance issues, and it's 500 because that's how big the list is in newpages. I'm also running it from 1000 after the beginning, to avoid hitting the CAT:CSDs. ST47Talk 19:04, 12 February 2007 (UTC)[reply]
- I see no problems, but can the bot skip pages on which it is just adding/removing white space? (The "Skip articles when no change made should do that") —METS501 (talk) 20:06, 12 February 2007 (UTC)[reply]
- I am currently skipping everything that is not tagged, that should help speed it up. ST47Talk 21:36, 12 February 2007 (UTC)[reply]
- OK, Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. Make a whole bunch of edits and report back here. —METS501 (talk) 02:44, 13 February 2007 (UTC)[reply]
- I am currently skipping everything that is not tagged, that should help speed it up. ST47Talk 21:36, 12 February 2007 (UTC)[reply]
- I trust the PERF comment wasn't intended as dismissively as it sounded. From Wikipedia:Bot policy (you'll note, policy): "The burden of proof is on the bot-maker to demonstrate that the bot: [...] is not a server hog". The newpages list is as large as you want it, up to 5000 articles, which should cover a day's "production" (at least until such time as it doesn't). If there's no particular reason to do batches of 500, then I think it would probably be preferable to do this in one batch, at "peak off-peak" time once per day, as much as anything to avoid "over-eager" tagging that would prove unnecessary if left slightly longer (through creator improvement or non-automated newpage patrol doing "better" cleanup). Also: what's the exact criteria for tagging with {{wikify}}, and with {{stub}}? How are you handling template-populated categories for purposes of tagging with {{uncat}}? Alai 05:08, 13 February 2007 (UTC)[reply]
- Hi. How many articles do we get a day, that survive speedy and all? my wikify criteria are currently something like 2-3 wikilinks, stub is very few sentences, and uncat is no categories - it doesnt count categories from templates, so a {{stub}} still has no categories. ST47Talk 11:15, 13 February 2007 (UTC)[reply]
- Trial edits made under User:STBot - still running though, until STBotT is approved. ST47Talk 17:41, 16 February 2007 (UTC)[reply]
- Shouldn't this be running on the unflagged account for which you're seeking approval, for the very reason of being on trial? Alai 19:39, 16 February 2007 (UTC)[reply]
- At the time, it wasn't approved for AWB. ST47Talk 20:27, 16 February 2007 (UTC)[reply]
- Ah, makes sense. Alai 21:27, 16 February 2007 (UTC)[reply]
- At the time, it wasn't approved for AWB. ST47Talk 20:27, 16 February 2007 (UTC)[reply]
- Shouldn't this be running on the unflagged account for which you're seeking approval, for the very reason of being on trial? Alai 19:39, 16 February 2007 (UTC)[reply]
- (ec)I'm told it's something like 2-3000 per day. How that interacts with speedying, redirecting, hand-tagging I don't know, and obviously it depends on just how soon they were to be bot-intercepted (hence my suggestion of doing it only once a day, after the "first wave" of cleanup has already hit (or is most likely to, at least)). As you've not given specific thresholds for applying "wikify" and "stub" are, I assume that means you're open to input on what those should be. (FYI, for the latter I've been using 500 bytes of wiki text, following input at the WSS talk page, and without complaint on that score.) Assuming you elict some sort of consensus on what those should be, I see no problem with those aspects of the task, which as I say I suggest you do by discussing same with the "cleaners-up" of those categories.
- However, I have serious reservations about ignoring categories from templates. Firstly, a small number of genuine "article-space" categories are populated by this means, so you'd get outright false positives in such cases, and likewise for the various flavours of disambiguation pages. (I've had one or two comments about this from the operation of my bot, where the disambigs were not correctly tagged, come to that, so one can imagine they'd be that much more heated if they actually were.) Furthermore, if it ignores template-populated maintenance and stub categories too, it's basically "double-tagging" something into multiple cleanup resources at once. Earlier discussion at CAT:NOCAT seemed to broadly be of the view that double-tagging with uncat and a stub tag wasn't desirable, especially as category cleanup is seriously backlogged, and articles with a sorted stub type are certainly not the most pressing cases. Maintenance categories are less clear-cut, but beyond a certain point piling on multiple maintenance tags looks a little silly. Given the aforementioned backlog, this is hardly an urgent matter, so I'd urge that this aspect of the task be deferred until the 'bot is able to handle this better. At a minimum, it should skip articles with known category-populating templates, and ideally, all of them, based on categorisation of the "live" page as served. (Come to that, my bot already does handle this, and could if needed be adapted to run from newpages-generated lists, rather than as at present from db dumps and special:uncategorizedpages (which are more accurate, but don't keep up at all closely with the rate of "production" of uncategorised pages).) I do agree that something like this will be needed in the medium-term, though. Alai 19:35, 16 February 2007 (UTC)[reply]
- Trial edits made under User:STBot - still running though, until STBotT is approved. ST47Talk 17:41, 16 February 2007 (UTC)[reply]
- Hi. How many articles do we get a day, that survive speedy and all? my wikify criteria are currently something like 2-3 wikilinks, stub is very few sentences, and uncat is no categories - it doesnt count categories from templates, so a {{stub}} still has no categories. ST47Talk 11:15, 13 February 2007 (UTC)[reply]
- I see no problems, but can the bot skip pages on which it is just adding/removing white space? (The "Skip articles when no change made should do that") —METS501 (talk) 20:06, 12 February 2007 (UTC)[reply]
- I may be alone in my opinion but I am to a small extent opposed to the bot wikifying new articles. I have no doubt that the bot can bold the first usage of the title in the article effectively, but doing so instead of a adding the wikify template means that the article is less likely to be looked at by a person, which means an article that should be deleted might be improved mildly then left alone for a long period of time. Certainly a bot can't decide if an article is worth keeping, but perhaps it could add some criteria for what it does at an unwikified article, for example if the article very short (one sentence long as one of your examples was) it would add the wikify template instead of wikifying. Vicarious 08:06, 22 February 2007 (UTC)[reply]
Request Expired. This request can be reopened at any time. —METS501 (talk) 20:41, 5 March 2007 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.