Wikipedia:Bots/Requests for approval/APersonBot 5

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Approved.

APersonBot 5

Operator: APerson (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:40, Monday, January 11, 2016 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/APerson241/APersonBot/blob/master/defcon/defcon.py

Function overview: Updates the statistics at {{Vandalism information}} using the last ~~30 minutes~~hour of recent changes.

Links to relevant discussions (where appropriate): Template talk:Vandalism information#Defcon bot down for a month, perhaps?

Edit period(s): Continuous

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The bot goes through the last ~~30 minutes~~hour of recent changes and counts the edits made with summaries that have "revert" or similar wording (for the exact regex, see here). Then, it divides by 3060 and updates {{Vandalism information}} with the resulting reverts-per-minute value.

Discussion

Approved for trial (15 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Speedily approved for trial, report back in a week with status update. — xaosflux ^Talk 01:53, 12 January 2016 (UTC)[reply]
This is a sensible straightforward implementation in theory, but it could provide inaccurate results. {{Vandalism information}} is supposed to represent the current rate of vandalism, not just reverts. For instance reverts in the recent changes might include those from edit wars, normal WP:BRD activity, or even someone reverting themselves. User:DefconBot goes off of #cluebotng-spam ^connect, which would offer a more accurate representation, as ClueBot only targets blatantly disruptive edits.
A930913, who maintains DefconBot, has said on their talk page that the bot has been suffering from issues with the ClueBot IRC feed. I haven't noticed any issues with the feed, but either way DefconBot is still occasionally making edits. Unfortunately it seems DefconBot writes to a page that is transcluded as vandal info, so not sure how the bots could work together, unless we get APersonBot to write to the same page?
I still think checking all reverts is better than not having any vandalism information, and in my opinion the potential inaccuracy shouldn't get in the way of this bot being approved. I do however think we should explore ways to either work with DefconBot, or improve the logic with this task. APerson what are your thoughts? Maybe we could use the ClueBot IRC feed, and leave raw recent changes as a fallback? — MusikAnimal ^talk 06:16, 15 January 2016 (UTC)[reply]
MusikAnimal, I agree with you that ClueBot reverts make more sense; I'll switch it over in about 12 hours when I'll get back to my main computer. APerson (talk!) 15:39, 15 January 2016 (UTC)[reply]
Note also you can go off it's scoring of edits, and not just ClueBot's reverts. You could probably implement a lot of intricate logic, but my point is ClueBot is very conservative with a 0.9 threshold, when most edits that score 0.8 and above are reverted. — MusikAnimal ^talk 16:36, 15 January 2016 (UTC)[reply]
The thing is, it's much harder to get the scores for each edit than the edits themselves; as far as I know, the scores are only accessible via the IRC feed, which is harder to read via bot than the contribution history. APerson (talk!) 18:29, 15 January 2016 (UTC)[reply]
Yes, parsing the IRC feed isn't quite as fun to implement =P Understandable if you don't want to go that route, I attempted it myself and only got but so far [1]. If you want to just check reverts, I would still go by the recent changes table (or API) and extend the regex to include reverts made with Huggle, STiki and Igloo. This again is because ClueBot will not revert everything. Also during peak hours patrollers might actually beat ClueBot to reverting vandalism, so you want to account for their activity. Be sure to exclude edit summaries that say "good-faith". This should give us a fairly comprehensive picture of how much vandalism is going on. The only issue is the patrollers go to bed, where ClueBot does not, which is why going by the scores would be most ideal. Just something to think about! :) Hope these recommendations are helpful. — MusikAnimal ^talk 06:23, 16 January 2016 (UTC)[reply]
As of this commit, the bot shouldn't count good-faith edits anymore. APerson (talk!) 04:24, 21 January 2016 (UTC)[reply]
Suggestions: Only have the bot update the page when the level changes to reduce editing. Change |info=RPM stats according to APersonBot to |info=%d RPM according to APersonBot so that the RPM can be seen in the template. — JJMC89 (T·C) 09:21, 17 January 2016 (UTC)[reply]
Done with this commit. APerson (talk!) 04:24, 21 January 2016 (UTC)[reply]
DefconBot does not use the IRC feed, ClueBot NG actually broadcasts it additionally through labs via redis. With the recent lab issues however, the feed has broken a number of times and Damianz, the maintainer of ClueBot has needed to be summoned to give it a kick. I haven't looked too much into what you've done, but I did a fair amount of research into getting the levels right, which I would be happy to share. Currently I lack time to do anything more than give my bots a kick when they stop working, so if there is fresh enthusiasm on this front, that is great.
My suggestions would be to either run APersonBot whenever DefconBot fails, or to wrap up DefconBot and use some of the techniques and research gathered there for APersonBot. If anyone wants access to DefconBot, tell me your labs username and I'll add you as a maintainer.
The most reliable method of contact with me wil probably be to email me at a930913tools.wmflabs.org 930913 {{ping}} 20:12, 21 January 2016 (UTC)[reply]

Before I wrote the defcon task for APersonBot, I read through DefconBot's source code and tried to get a Redis solution of my own working, as well as an IRC solution; neither were as robust as the one I have with the recent changes API, so I went with the latter. Looking at ClueBot NG's current source code, it doesn't appear (to my non-ClueBot-developer eyes) that there's much support for the Redis feed. Pretty soon, I plan to use quarry and numpy to update the ranges at {{WikiDefcon/levels}} so APersonBot can get closer to DefconBot's percentage values. Thanks for the advice! APerson (talk!) 03:33, 22 January 2016 (UTC)[reply]
Trial complete. (pinging Xaosflux.) APerson (talk!) 16:18, 28 January 2016 (UTC)[reply]
Hm... I'm just a bit unclear where we stand with this task now, given the above comments by A930913. — Earwig ^talk 04:11, 1 February 2016 (UTC)[reply]
Well, as A930913 stated, the Redis feed, which is an integral part of DefconBot's functionality, is indefinitely down. The second suggestion he made - that we "wrap up DefconBot" and use APersonBot instead - seems a lot more viable to me. APerson (talk!) 04:18, 1 February 2016 (UTC)[reply]
I think what APersonBot is doing now is better than nothing, but we should really aim to measure actual vandalism than just undoing of edits. Edit wars are common, and this is going to detect those, along with other innocent undo's. If we don't end up using the ClueBot feed, either through redis or IRC, I recommend simply changing the regex to check only for Huggle, STiki, Twinkle (generic summary as produced by VANDAL revert), and Igloo edits in recent changes. It's with these tools that the vast majority of counter-vandalism is performed, and will offer a more accurate picture of what's going on. I might be missing what quarry and grumpy will do, that perhaps would help improve accuracy? — MusikAnimal ^talk 06:41, 3 February 2016 (UTC)[reply]
Sure, I can implement that, although I don't know how big its effect will be: looking at the live reverts stream, a large majority of the reverts are countervandalism-related. Furthermore, even if edit wars and undo's did make up a significant minority of the edits, it would still take anywhere between 12 and 24 edit-war-related reverts per hour to bump up the defcon level. I'll start writing regexes. APerson (talk!) 15:40, 3 February 2016 (UTC)[reply]
After watching the live reverts stream for a while and concluding that there were a lot of IPs reverting eachother, I've concluded that we'll need to rescale what RPM levels correspond to what defcon levels after the change is made. APerson (talk!) 16:11, 3 February 2016 (UTC)[reply]
Do any of the issues raised in Wikipedia:Bots/Requests for approval/VoxelBot need to be double checked with this bot? --slakr^\ talk / 03:42, 6 February 2016 (UTC)[reply]
Thanks for bringing that up; I haven't read through that BRFA yet. The edit frequency and sample size are not an issue with this bot; the bot edits every hour and uses every edit from the past hour. However, the false positives are more interesting. Looking at the way VoxelBot classified edits, there are quite a few ideas I could incorporate into the way APersonBot detects edits. Specifically, VoxelBot has more keywords that I could check for; also, the false positives coming from keywords showing up in section headers shouldn't be counted. MusikAnimal, do you think I could get away with this keyword-based approach instead of classifying edits based on the anti-vandalism tools used? APerson (talk!) 16:25, 6 February 2016 (UTC)[reply]
Just a status update, if anyone's interested: I was notified of a bug involving the bot being logged off forcibly by some nefarious entity on Labs. From now on, the bot will detect when it's in danger of being logged off and make a pointless edit to keep its session alive. (At least, that's how it'll work in theory.) I've also just finished implementing VoxelBot's more advanced way of detecting reverts, and I'll have the numbers to allow these to be converted into defcon levels pretty soon. APerson (talk!) 03:30, 11 February 2016 (UTC)[reply]
Would you like to run a week-long trial with the new logic? — Earwig ^talk 03:42, 11 February 2016 (UTC)[reply]
If only because I've just changed the way the bot determines RPM, that sounds like a good idea. APerson (talk!) 03:45, 11 February 2016 (UTC)[reply]
Yep, just to be safe; hopefully not too much extra pressure on anyone. Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. — Earwig ^talk 03:46, 11 February 2016 (UTC)[reply]
Just curious about how the bot is getting logged out. Is this a known issue? Instead of making pointless edits, could it not be programmed to login as needed? — MusikAnimal ^talk 00:41, 14 February 2016 (UTC)[reply]
MusikAnimal, I already do. It's just that sometimes the login doesn't work because the session has been invalidated due to lack of activity. (I still don't understand how this is happening, so "lack of activity" is my guess. Tgr has been extremely helpful on IRC with this; maybe he could clarify a bit.) APerson (talk!) 01:51, 14 February 2016 (UTC)[reply]
Bot logins should last 30 days, just like normal user logins with the "remember me" checkbox enabled. There have been multiple confounders recently - SessionManager has been deployed, which meant a complete rewrite of all session handling code, and possibly new bugs (although all the ones we are aware of have been fixed), the servers on which the sessions are stored underwent an OS upgrade, which caused session losses every time a new server was put into circulation (although technically a session loss should not log you out), the token handling code was rewritten (although again I would expect that to cause token errors but no logouts).

If you are still getting logged out occasionally, please report the details, but here are some things you could do to make the bot more robust:
Use assertions (that will ensure your bot does not perform actions anonymously).

Use OAuth.

That said, I wonder if you are doing something unusual? Pywikibot should alredy use assertions and log you in transparently when needed. --Tgr (talk) 07:25, 16 February 2016 (UTC)[reply]

Re: session activity, sessions expire in one hour if you don't do anything (doesn't have to be an edit, any API request should suffice, such as .get() for some arbitrary page). The login should survive the session though; if it doesn't, that's a bug. --Tgr (talk) 07:28, 16 February 2016 (UTC)[reply]
Fixed - removed pointless edits, as they don't seem to be necessary. APerson (talk!) 16:08, 17 February 2016 (UTC)[reply]
Approved for extended trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. to iron out any other possible issues --slakr^\ talk / 04:38, 19 February 2016 (UTC)[reply]
Trial complete. Having inspected the logs for the past few days, the bot's looking pretty good. The new logic seems to have lowered the severity of the average defcon, which is probably a consequence of the bot catching less false-positive "vandalism reversion" edits. APerson (talk!) 02:52, 27 February 2016 (UTC)[reply]
Seems like the issues that were raised have been resolved. You might consider also de-bouncing the updates further by avoiding an edit if it doesn't cause an actual defcon change (e.g., a change from 3 to 4 to 3 again doesn't change the defcon, and my guess is people aren't paying as much attention to the precise RPM as far as many of the derivative templates are concerned. That said, it's entirely just a suggestion and judgement call on your part, though; you could just leave it up to the template maintainers to deal with what constitutes a significant change. :P

Other than that,

Approved.

Cheers, --slakr^\ talk / 03:42, 27 February 2016 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.