Wikipedia:Bots/Requests for approval/Ilmari Karonen's adminbot

The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was

Approved.

Ilmari Karonen's adminbot

Operator: Ilmari Karonen (talk)

Automatic or Manually Assisted: Automatic

Programming Language(s): Perl

Function Summary: Undelete 415 incorrectly deleted image talk pages to repopulate Category:Talk pages of deleted replaceable fair use images.

Edit period(s) (e.g. Continuous, daily, one time run): One time run

Already has a bot flag (Y/N): No

Function Details: See also User:Ilmari Karonen's adminbot/Task 1 and User:Ilmari Karonen/Rtd.

Now that we finally have what appears to be a reasonable and generally accepted policy on adminbot approval, I thought I'd like to try it out with a simple case.

For some background, since late 2006 the template used to mark disputed replaceable non-free images, {{di-replaceable fair use disputed}}, has recommended that, should the image be deleted, any debate regarding the deletion be archived on the image's talk page using the templates {{rtd}} and {{rb}}. Such talk pages, even though they are "orphaned" due to the deletion of the image itself, should be preserved as a record of the debate.

However, it came to my attention a while ago that a number of these archived deletion discussions (in fact, pretty much all of them, at the time) had been deleted by various administrators, ostensibly under speedy deletion criterion G8 ("orphaned talk page"), even though the criterion contains an explicit exemption for such pages. The majority of these deletions were carried out by administrator MZMcBride's G8 deletion bot, which at the time did not recognize the {{rtd}} tag.

I recently compiled a list of all deleted image talk pages tagged with {{rtd}}. Of the 480 pages found, 65 had been deleted more than once, corresponded to an existing (local or Commons) image or were deleted by someone other that MZMcBride's bot. I have manually reviewed and (in all but one case) undeleted these pages already. However, I do not really feel like carrying out 415 more manual undeletions for the remaining pages, so I'd like to request approval for an adminbot to undelete these bot-deleted pages. I have asked MZMcBride whether they'd have anything against these pages being undeleted, and they have said they have no problem with it.

What will the bot do?

The bot will undelete 415 image talk pages (see list here or here) which were originally deleted by MZMcBride's deletion bot and were tagged with Template:Rtd at the time of deletion. In cases where the template has been substed, the bot will also edit the page to un-subst it so that the pages will be categorized properly.

How fast/long will it run?

I have programmed the bot to wait five seconds before each undeletion or edit, at which rate I expect this task to take roughly an hour. The bot also uses the "maxlag" parameter to make it slow down if the servers are under high load.

Has it been tested?

I have tested the undeletion code under my own account on test pages within my own user space. I have also tested the deleted revision retrieval and un-substing features on the actual list of target pages without actually undeleting anything.

Is the source code available?

Yes, here.

Will you use the bot account for other things?

I will not use the Ilmari Karonen's adminbot account for anything other than this specific one-time task described above without filing a separate approval request. As I am not planning to carry out further tasks with this adminbot in the immediate future, I'd like to ask that this bot be desysoped and deflagged once this run has been completed.

Why bother with this BRFA, why not just do it?

Because I can. I've said before that I'd like to seek official approval for running an adminbot if I had any use for one. Well, now I do. I'm certainly hoping and expecting that, given the highly specific and limited nature of the task for which I'm seeking approval, this task will be approved quickly and with minimum hassle.

Per the current instructions at Wikipedia:ADMINBOTS, I have organized the following discussion into two sections: Community approval and Technical assessment. The former is intended for general discussion about the appropriateness of the proposed task, while the latter is for technical review of the bot's features and implementation by the Bot Approvals Group and other users interested in such issues.

Community approval

Since this is a one-time-only thing, it seems better to me to let Ilmari Karonen run the bot off his regular admin account, rather than creating a new admin account for a single task. Separate accounts are important for long-running tasks, but less so for a one-hour job that, arguably, could be run as a semi-automated task with no BAG approval. — Carl (CBM · talk) 02:26, 1 October 2008 (UTC)[reply]

That would be okay with me too, although I did already register the account. —Ilmari Karonen (talk) 02:31, 1 October 2008 (UTC)[reply]

Something that is going to take an hour and be done once probably doesn't justify the overhead of getting community input, an approval, flagging, admin rights, and then catching a steward to deadmin it. The ends don't justify the means, so just do the task in a monitored way please. Just look at and approve each edit. - Taxman ^Talk 01:23, 2 October 2008 (UTC)[reply]

Why would a steward have to de-sysop it? --MZMcBride (talk) 02:20, 2 October 2008 (UTC)[reply]

Since the account is only being proposed for one tasks, I agree it would be reasonable to desysop it after the task is done. That's why I suggested running the task under the existing admin account. — Carl (CBM · talk) 02:50, 2 October 2008 (UTC)[reply]

And even if the community didn't feel it needs to be deadminned just after it was done (which I don't think would be the case), a task this short still doesn't justify all this effort just because we can. If they need doing, just get the edits done in a way that follow the regular guidelines. We should always be thinking of the minimum overhead way to improve articles and this represents nearly the opposite. I love the initiative in putting forth a bot to solve tasks, but there are lots of others out there that need doing and coding. - Taxman ^Talk 12:18, 2 October 2008 (UTC)[reply]

As I noted to Carl above, I'd be happy to run this under my own account if nobody objects. Contrary to what you might seem to be assuming, I'm not trying to create bureaucracy for its own sake: having filed this approval request and brought it to the attention of the community per the new adminbot policy, I'm quite willing to just wait a few more days and then, if no-one has objected, take it as a sign that the task and implementation I've proposed enjoy consensus and that I should just go ahead with it. Perhaps I have been excessive cautious; I did more or less expect at least one "OMG SkyNet" objection, but if even those who'd normally object to adminbots on such grounds feel that this task is safe and limited enough, then I'm certainly happy.

As for "looking and approving each edit", I'm not sure how meaningfully I could do that for the undeletions. I guess I could make the code require a keypress before each undeletion, but I'm not sure how much that would gain (especially given that this is a limited run anyway, and so amenable to both advance and after-the-fact review), and if no-one minds, I'd rather try to keep my risk of repetitive strain injury to a minimum. In fact, if I wanted to "just get it done if it needs doing", I'd much rather just run this without approval on my own account than add a mostly ceremonial "confirmation" step just to fit inside the letter of the policy. But since this task isn't in any way urgent, I don't see how asking for some feedback first can hurt. —Ilmari Karonen (talk) 16:59, 2 October 2008 (UTC)[reply]

But that's what we're saying. People use AWB for tasks of this magnitude all the time and approve each edit manually. That's the type of thing I mean by get it done, not just run an unapproved adminbot. And it can be done in an entirely RSI friendly way. I'm no RSI expert, but various approval keys could be used, or various methods of hitting the approval key so that the motions are not 100% repetitive, etc. I'm sure other solutions exist for that if it is an issue. Oh and I'm not saying your intention was to create unnecessary overhead, just that that is the result and could have been avoided. - Taxman ^Talk 20:25, 2 October 2008 (UTC)[reply]

I agree with Taxman. The deletions are obviously wrong and must be undone - it is impossible to expect any reasonable person to believe otherwise. A one-off task like this that will take less than 10 minutes to complete doesn't really warrant a discussion of this nature. I trust that you won't blow up the wiki while trying to get this done, so just go for it! east718 // talk // email // 05:54, 4 October 2008 (UTC)[reply]

Just to be clear, I'd like to explicitly ask if anyone has anything against me running this task on my own account, as suggested by Carl, Taxman and east718 above? If no-one objects before, oh, say, next Wednesday, I'm going to go ahead and just do that, with the assumption that there's a general consensus in favor of it. —Ilmari Karonen (talk) 20:21, 4 October 2008 (UTC)[reply]

To remain perfectly clear, as long as you confirm the edits manually, then there is no cause for anyone to object. You don't even need the bot policy to do that, and you certainly don't need to wait for it. But as I said above, I specifically do not support running this unattended which would amount to running an unapproved adminbot. Now that we have a workable adminbot policy that wouldn't be appropriate. The reason I brought this all up is that I think that we should all keep in mind that the extra overhead required to examine and approve an adminbot should only be undertaken when the return is sufficiently high, which would mean the task is ongoing and/or of high volume. I think that is a reasonable standard going forward and having such a standard is why I think all this fuss was warranted. - Taxman ^Talk 14:43, 5 October 2008 (UTC)[reply]

It wouldn't be unapproved if it was given explicit approval here, now would it? True, the new adminbot policy does imply, even if not quite saying so outright, that approved adminbots should generally have their own accounts, but surely the community (which ought to be well represented here, given that this has been linked from Wikipedia:VPP, Wikipedia:AN and Wikipedia talk:RFA) is empowered to authorize a common-sense exception to the rules it itself has written. If you wanted to wikilawyer it, I suppose you could always call it a 415-page "trial period". :)

Besides, the other option — getting a 'crat to flag and sysop the bot account I already registered for this and, optionally, a steward to desysop it afterwards — wouldn't be that cumbersome either: all it takes from either of them, once presented with a clear record of consensus, is ten seconds with Special:UserRights (none of that laborious ~~vote-counting~~ consensus gauging as with a real RfA), and, as the task is not urgent, I'm completely happy to let them take as much time with it as they feel like.

As for your suggestion that I do this as a semi-automated task, that is what I'd call creating pointless work just to satisfy the letter of a rule (while completely ignoring its spirit). Put simply, I've already reviewed all the 415 pages the bot will be undeleting to such an extent that I am quite certain that each and every one was inappropriately deleted and ought to be restored and that the code I have written will properly restore each of them. I brought this BRFA here precisely because, now that we have a workable policy for it, I was hoping others could check the task and the code (as indeed Carl and others have done) and agree that it should be given the go ahead.

I suppose it might be argued that the task, as I'm proposing it, already is semi-automated — just with the exception that I'm reviewing the edits in bunches of 415, not one at a time. But if not, I'm not going to make it demand that I spend an hour pressing enter 415 times at 5–10 second intervals: it might not actually hurt my carpal tunnel, but it would make me feel extremely silly and finally convinced that, whatever the claims to the contrary, Wikipedia has indeed become a bureaucracy and a slave to the mindless following of rules just because they are rules. —Ilmari Karonen (talk) 16:56, 5 October 2008 (UTC)[reply]

I guess you have a point that since this has already been done the wrong way with much of the extra work and overhead already done, it doesn't need to be treated like it hasn't. But for your other point, it's not the flipping the bits that is the cumbersome part, you're right that part is ten seconds. What you're missing is that when I flip the bit on something I review the entire thing to make sure what I'm doing is right, because it's my reputation and bit on the line and that takes time. I'm sure other bcrats and stewards are similar. And before that can even happen there are the other extensive reviews needed and the community input and the review of that input etc, and that is far too much to justify for this type of task in the future.

As it is now I still don't think doing it semi automated is creating pointless work to satisfy the letter of the rule. In fact, it's all about respecting the spirit of why we have rules about bots and especially admin bots: they should either have full review or not be run. So far only you and Carl have specifically stated you've reviewed the code and are ok with it. For an admin bot I believe the community has pretty high standards of review and that it would still take significant work going forward to satisfy that standard. While you have no problem tying up those resources I don't agree that is an appropriate use of them to avoid what is a highly routine task size for semi automated wiki editors. Again though since part of the extra overhead has already been expended it's up to what the consensus thinks. So far Carl is fine running it on your account, I'd like further code review before I was personally comfortable with that, and I'm not sure exactly what east718 wants. He said he agrees with me then said go for it, so some clarification would be needed there and some additional input. - Taxman ^Talk 19:54, 5 October 2008 (UTC)[reply]

← It does occur to me that no actual BAG members have yet seen fit to comment on this bot, something that would generally be seen as a prerequisite for approval (them being the Bot Approvals Group and all that). Would a code review by one or more BAG members help address your concerns? If so, I could slap a {{BAGAssistanceNeeded}} tag on this page. (Actually, let me just do that anyway...)

Alternately, what if I were to run this code in batches of, say, 20–50 pages, while pausing between each batch to check that nothing has gone wrong? Would that be a sufficient degree of review to count as semi-automated for you? If so, it would be a fairly simple change, and would not significantly deviate from the workflow I'd been planning originally (which was to let the bot run in the background and keep an eye on its contribs as it runs). Doing it in batches would at least let me get a cup of coffee, work on my thesis or do something else useful in between. —Ilmari Karonen (talk) 20:36, 5 October 2008 (UTC)[reply]

Technical assessment

As a very experienced Perl bot operator, I looked through the code in some detail. I don't see any issues. The page content is properly utf-8 encoded. Maxlag is not used, but a 5 second delay is perfectly adequate. Error handling is minimal, but this is adequate for a short-running, one-time-only script. I wasn't aware undelete tokens also function as edit tokens, but nothing surprises me anymore. — Carl (CBM · talk) 02:25, 1 October 2008 (UTC)[reply]

They do, as documented at mw:API:Edit - Undelete and mw:Manual:Edit token. Yes, I was surprised too. —Ilmari Karonen (talk) 02:29, 1 October 2008 (UTC)[reply]

The unsubsting part of the task may run into bugzilla:15647; if so, I know of no way to avoid it (besides not using the API to make the edit). Anomie ⚔ 16:58, 1 October 2008 (UTC)[reply]

My userspace testing showed no such problems with that feature. —Ilmari Karonen (talk) 17:40, 1 October 2008 (UTC)[reply]

Ok. I saw you tested the undeleting in your userspace and the whole thing without performing the undelete or unsubst, but I wasn't sure if you did a test where it actually performed the undelete-and-unsubst in your userspace. Anomie ⚔ 19:35, 1 October 2008 (UTC)[reply]

Ah, yes, I probably should've left the test pages undeleted. :) Here you go: diff 1, diff 2, diff 3. —Ilmari Karonen (talk) 20:07, 1 October 2008 (UTC)[reply]

FYI, I've figured out why your code isn't triggering the bug. The bug only occurs when there is at least one deleted revision. After you undelete, there are (of course) no deleted revisions left. Anomie ⚔ 21:51, 1 October 2008 (UTC)[reply]

Since this is a non-urgent task, it should probably delay 10 seconds between actions instead of 5 (per Wikipedia:BOT). There's no particular reason not to follow the policy. Anomie ⚔ 16:58, 1 October 2008 (UTC)[reply]

This is also fine by me. —Ilmari Karonen (talk) 17:41, 1 October 2008 (UTC)[reply]

I've updated the code to keep the default delay at 5 seconds but to also use the maxlag parameter (with exponential fallback starting at 5 seconds) on all requests. I believe this should satisfy the policy. With the maxlag support now in, I might even consider reducing the default delay to 2 or 3 seconds if no-one has anything against it. (Ps. I moved this thread here from the community approval section, since it feels rather technical to me.) —Ilmari Karonen (talk) 18:21, 1 October 2008 (UTC)[reply]

Somewhat tangential further discussion about maxlag, click show to read

Maxlag errors can also be detected by the presence of an error tag in the XML response, with $xml->{'error'}->{'code'} set to "maxlag". The error message will include an 'info' parameter that tells you how long to wait. The delay may also be available in the headers, I have never checked them. Compare lines 1231-1241 here. — Carl (CBM · talk) 21:10, 1 October 2008 (UTC)[reply]

For the API, there appears to be no "Retry-After" header. Somewhat inconveniently, it also seems the API returns a "200 OK" status code even for maxlag errors, necessitating a separate check. (I can see why they might want to do that, but for my particular use case it's still annoying.) I did have it checking the XML error code first, but decided that pulling the same info out of the HTTP headers was simpler and more robust. I guess I could make it do that and pull the delay out of the info parameter, though it's somewhat annoying to have to extract it from what's supposed to be a human-readable error message. Perhaps I should submit an API patch to add a "retryafter" parameter. —Ilmari Karonen (talk) 21:24, 1 October 2008 (UTC)[reply]

Anyway, the "Retry-After" value returned by the non-API side seems to be simply max( intval( $maxLag ), 5 ) ) — that is, whatever the client sends in the maxlag parameter or 5 seconds, whichever is more. I just committed a MediaWiki patch that makes the API return the same "Retry-After" and "X-Database-Lag" headers, but that should not make any difference for this bot: the Retry-After header will just contain "5", which is what my bot code sleeps at a minimum anyway. —Ilmari Karonen (talk) 21:42, 1 October 2008 (UTC)[reply]

Based on previous discussion with Roan Kattouw about this, I believe the API philosophy is that all API-level errors will be reported in the XML response, and that the HTTP code will be set to 200 unless there is actually an HTTP-level failure (which includes squid failures). Since maxlag is not an HTTP level failure, it's handled the same as any other error (page doesn't exist, bad parameter, etc). The API error messages are meant to be computer readable, although it may require some routine parsing. — Carl (CBM · talk) 22:41, 1 October 2008 (UTC)[reply]

I don't think the value in the 'info' parameter really tells you how long to wait; it actually tells you how long the most-lagged database process has been waiting for. Say for example some query takes an hour to run. Maxlag will report 5 seconds after is has been running for 5 seconds, 60 seconds after it has been running for a minute, and so on. Just before it completes, maxlag will report a whopping 3599 seconds; 1 second later the query finishes and maxlag jumps back to 0. Anomie ⚔ 21:51, 1 October 2008 (UTC)[reply]

It's true; waiting for the entire duration of the lag is not required by mw:Manual:Maxlag_parameter. I have considered implementing a "maximum maxlag delay" in my library, but it's such a rare even to see 3000 seconds of lag that I just let the bot wait it out. mw:Manual:Database_access#Lag attributes lag primarily to large writes. — Carl (CBM · talk) 22:41, 1 October 2008 (UTC)[reply]

I think the exponential backoff strategy I've used in my code really makes the most sense, both for maxlag violations and for other kinds of transient errors. Starting with a small delay and doubling it for each successive retry allows safe and graceful handling of prolonged outages while minimizing the disruption from brief and intermittent failures. —Ilmari Karonen (talk) 23:28, 1 October 2008 (UTC)[reply]

In my code I back off HTTP and squid errors exponentially, but not maxlag errors, where I just wait for however long the DB is lagged. I don't see any technical arguments against your way of doing it, though. I had some other considerations for my code that aren't applicable here. Since your code is fine, rather than complicate the bot request page, let's discuss it on my talk page if anyone wants to keep it up. Maybe I should have cut myself off sooner. — Carl (CBM · talk) 23:39, 1 October 2008 (UTC)[reply]

{{BAGAssistanceNeeded}} While I very much appreciate Carl's review and feedback above, could someone actually belonging to the BAG still please have a look at the code and see if it looks OK? —Ilmari Karonen (talk) 20:46, 5 October 2008 (UTC)[reply]

I agree with Carl's assessment. Your handling of maxlag appears to be correct. On the chance that your backoff does get too high, you can always reset the bot, so it isn't a big deal. The database lag value need not be used as a backoff delay, as it is very possible for the lag to resolve itself in much less than that time and because redoing the same query early causes hardly any server stress. --uǝʌǝs ʎʇɹoɟ ʇs(st47) 01:18, 8 October 2008 (UTC)[reply]

Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. Go ahead and start with 25 actions. --uǝʌǝs ʎʇɹoɟ ʇs(st47) 03:10, 13 October 2008 (UTC)[reply]

Okay. Running 25 undeletions under my own account, with edit summary "image talk pages marked with Template:Rtd are exempt from CSD G8 (bot undeletion; trial run)" using this code. —Ilmari Karonen (talk) 13:08, 13 October 2008 (UTC)[reply]

Trial complete. The undeletions are logged here and the un-substing edits here. —Ilmari Karonen (talk) 13:24, 13 October 2008 (UTC)[reply]

Approved. Task should be run on your account, this account will not be flagged. BJ^Talk 05:26, 18 October 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.