Wikipedia:Bots/Requests for approval/RonBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Ronhjones (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 22:36, Thursday, March 2, 2017 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): Python
Source code available: User:RonBot/Source1. Code made by User:DatGuy who is unable to request as this bot needs an admin flag.
Function overview: The bot will revision delete the unused file versions and remove the "Orphaned non-free revisions" template from the files in Category:Non-free files with orphaned versions more than 7 days old.
Links to relevant discussions (where appropriate): Wikipedia_talk:Files_for_discussion#Admin_bot_to_delete_old_revisions_of_non-free_files
Edit period(s): Daily, maybe continuous. There are over 200,000 Non-free files > WP:Image resolution at present. The manual processing of these is a major problem to getting all the non-free files sorted.
Estimated number of pages affected: xx/day
Exclusion compliant (Yes/No): No Yes
Adminbot (Yes/No): Yes (needs admin flag set)
Already has a bot flag (Yes/No): No
Function details: Files that have been tagged with the Orphaned non-free revisions template for over 7 days populate the category Category:Non-free files with orphaned versions more than 7 days old - often the result of DatBot 6. The bot will revision delete the unused non-free files, leaving just the current file showing. Finally the template is removed from the description. Bot need Admin Flag as it uses Revision Delete.
Community notifications for a new adminbot
[edit]- Wikipedia:Bots/Requests for approval/Adminbots
- Wikipedia:Village pump (proposals)
- Wikipedia:Administrators' noticeboard
Discussion
[edit]- @DeltaQuad: I believe you had been working on this, just FYI. I encourage quick trial and approval of this bot, as it's very useful and will save a massive admin time sink. ~ Rob13Talk 23:25, 2 March 2017 (UTC)[reply]
- I had been. I'll cancel my project. -- Amanda (aka DQ) 23:28, 2 March 2017 (UTC)[reply]
It does not appear you are checking that the page has actually aged after being tagged? (e.g. If I slap Category:Non-free_files_with_orphaned_versions_more_than_7_days_old on file you will just start deleting old versions?) — xaosflux Talk 04:18, 3 March 2017 (UTC)[reply]
- That's my reading of the source, too. There's other easy traps to fall into here: fairly often an image gets overwritten again after the first tagging, so it's not correct to delete all non-current versions yet; and more than once I've caught someone tagging a file page with "{{Orphaned non-free revisions|date=something more than a week ago}}" to gain advantage in an edit war. So neither the category nor the template params are safe to rely on: you've got to look at the timestamps in both the edit and upload history. —Cryptic 04:52, 3 March 2017 (UTC)[reply]
- Question: Can the bot handle multiple rev'dels of a single image or will it have to loop back around sometimes? Occasionally there is a misunderstanding and there ends up being multiple file revisions that have to go. --Majora (talk) 04:23, 3 March 2017 (UTC)[reply]
- For each file, bot gets a list of versions to hide and loops through this list. No differently to doing in manually - cannot manually revdel multiple images with current wiki system, has to be one at a time. Ronhjones (Talk) 15:49, 4 March 2017 (UTC)[reply]
- Some concern about the potential for abuse. The File: namespace is poorly watched, and it seems like it would be easy to overwrite a file, add a fair-use tag if there wasn't one there already, and have it go unnoticed long enough for the old files to be deleted. —Cryptic 04:52, 3 March 2017 (UTC)[reply]
- Echoing the concern about possible abuse. I do not think that complete processing of all valid files should be the goal of this bot, it should only process the easy cases where the waste of human time is high and the possibility of abuse or controversy is minimal. Therefore, I suggest that the bot only perform revision deletion if it meets all these conditions:
- The orfurrev tag exists (i.e. page is not placed in category manually as Xaosflux suggests above)
- The tag was placed 7 days ago, as verified by the revision timestamp. Disregard the timestamp in the tag.
- At the time of uploading of each new visible version not performed by a bot, a non-free copyright tag existed on the file
- Of the visible versions of the file, there is at most one unique non-bot non-admin uploader (version wars sometimes occur)
- Only the versions uploaded before the tag was placed are deleted (some users revert the reduction bot)
- If the file does not meet all these criteria, a flag should be added to the orfurrev tag that places the file into a separate category for administrator review. As the vast majority of cases are likely simple reductions, they should meet the above criteria. — Train2104 (t • c) 06:13, 3 March 2017 (UTC)[reply]
- If an admin appears as an uploader, check that all subsequent uploads were made by admins or bots. Your check would allow a non-admin to overwrite a file originally uploaded by an admin and thus vandalise the file.
- Instead of searching for a non-free copyright tag, you could alternatively search for Category:All non-free media, which should appear on all non-free files (transcluded by a template). --Stefan2 (talk) 00:16, 4 March 2017 (UTC)[reply]
Some notes and questions:
- What happens if someone adds Category:Non-free files with orphaned versions more than 7 days old to something which does not have any local file uploads, such as an article, a redirect or a local file information page for a file on Commons? It looks as if the program is meant to skip such pages (some function would probably throw an exception), although there seems to be an indentation error around the
try:
line in the source code (so the program probably won't run at all).- Just to note, this is user:DatGuy's code and I propose to run it as he is not an admin, and by the same token he cannot fix deletion errors if they occur on trial. "Page" in the deletepage api call has a format like File u'File:File:Groundhog Day (movie poster).jpg' from 'https://en.wikipedia.org'. I can't see why a check could not be added to the page parameter before execution.Ronhjones (Talk) 16:06, 4 March 2017 (UTC)[reply]
- After the revisions have been hidden, please also remove the template from the file information page. The program currently doesn't do this.
- I spotted that myself, and already noted on user talk:DatGuy
- We only want the file hidden, not information about its uploader, its upload summary or its size. Unless I'm mistaken, this means that you should use action=deleterevision, not action=delete (but someone familiar with deletion APIs should confirm).
- I'll check with DatGuy and ask him to review this page
- In the past, old revisions were commonly deleted completely. It may be useful to unhide the information about the uploader, the file size and other information so that only the file itself is missing. However, I'm not sure if a bot should do this as upload summaries could contain personal attacks.
- The normal system using something like User:B/rescaledsidebar.js is to just revdel the image. Ronhjones (Talk) 16:06, 4 March 2017 (UTC)[reply]
- Please check that the file isn't tagged as free. If the file both is "free" and "unfree", then the file may have been tagged incorrectly, so human inspection is needed. In particular, files in the following categories may be free:
- Category:All free media: If the file is free, then it obviously isn't unfree.
- Category:All free in US media: If the file is free, then it obviously isn't unfree.
- Category:Possibly free images: If the copyright status is disputed, then any copyright tags may be wrong, so don't trust them.
- Category:Wikipedia files with disputed copyright information: Same as the above.
- Category:Wikipedia files for discussion: Files are sometimes listed there if the copyright status is disputed, or it may be disputed which revision we should use on Wikipedia. Also, the file might be deleted altogether when the discussion ends, so deletion of old revisions might only add extra unnecessary entries to people's watchlists.
- Category:All Wikipedia files with the same name on Wikimedia Commons: Unfree files are not permitted on Commons. If it is on Commons, then the old revisions are probably free, or the file should be deleted on Commons.
- Category:All Wikipedia files with a different name on Wikimedia Commons: Same as the above.
- Category:Items pending OTRS confirmation of permission by date: If an OTRS ticket shows up, then this may have the effect that the copyright tag is changed. It's better to wait and see.
- Category:Wikipedia files with unconfirmed permission received by OTRS by date: Same as the above.
- Category:Items with OTRS permission confirmed: Files with this template are unlikely to be unfree. It's better to consult someone with OTRS access about the file's copyright status.
- Category:Media files requiring de-merge: Some of the old revisions may have a different copyright status.
- Will talk to DatGuy Ronhjones (Talk) 16:06, 4 March 2017 (UTC)[reply]
- Why isn't the bot exclusion-compliant? --Stefan2 (talk) 00:16, 4 March 2017 (UTC)[reply]
- If it is needed, then we can add it. Ronhjones (Talk) 16:06, 4 March 2017 (UTC)[reply]
{{OperatorAssistanceNeeded|D}}
Please see initial questions about your code and methodology above. — xaosflux Talk 04:05, 4 March 2017 (UTC)[reply]- About exclusion compliance, I wonder if we'll see people trying to keep illicitly large versions of the file by adding a nobots tag if the bot becomes exclusion compliant. Jo-Jo Eumerus (talk, contributions) 10:10, 4 March 2017 (UTC)[reply]
- In my opinion, the bot should make a few simple checks to determine if it's safe to delete the old revisions automatically, and if not, add a parameter,
|human needed=yes
, to {{orphaned non-free revisions}}, which then adds the file to a different category. If the file has {{nobots}}, then this would mean that the bot shouldn't delete the old revisions but simply add the parameter to the template. However, even if the bot leaves a few files in Category:Non-free files with orphaned versions more than 7 days old unprocessed, this wouldn't be a huge problem. After each bot run, the category should be almost empty (containing only files that the bot couldn't process), so a human could easily spot these files by simply looking at the category shortly after a bot run. --Stefan2 (talk) 12:46, 4 March 2017 (UTC)[reply]
- In my opinion, the bot should make a few simple checks to determine if it's safe to delete the old revisions automatically, and if not, add a parameter,
- Doing... Fixes. Will pastebin updated code. Dat GuyTalkContribs 16:09, 4 March 2017 (UTC)[reply]
- @Xaosflux, Cryptic, Train2104, and Stefan2: Code updated, see [1]. With this code, I am suggesting a 'ease the work on admins with no false positives' more than a 'clear it 100%, but have some false positives' approach. Dat GuyTalkContribs 00:19, 5 March 2017 (UTC)[reply]
- From my reading: this addresses most of the issues above, except for the user verification to prevent edit wars. I'd echo the above recommendation that anything the bot can't deal with should have a parameter added to its template so an admin can review it manually. Merely leaving them behind in the category will leave them hard to find especially if the reducer bot continues to run. — Train2104 (t • c) 01:07, 5 March 2017 (UTC)[reply]
- Am not the best with templates, but I'll try to create the subcategory and add the param in the template. Please check it later if I've done it correctly. Dat GuyTalkContribs 10:46, 5 March 2017 (UTC)[reply]
- We might actually be better off with making the existing template also add a Category:Non-free files with orphaned versions more than 10 days old (or whatever other time), so A) the bot doesn't have to explicitly add it to each image, B) images where the bot throws (and so does nothing) don't just get lost and ignored, and C) if the bot stops running for whatever reason, the admins patrolling the leftover images see all images instead of none. —Cryptic 20:55, 5 March 2017 (UTC)[reply]
- Just noticed that I haven't updated the onwiki code, but if it fails the abuse check or has nobots it adds the |human=yes parameter. Dat GuyTalkContribs 16:49, 7 March 2017 (UTC)[reply]
- A 10-day category could be better than a
|human=yes
parameter if we want to find all missed files, yes. --Stefan2 (talk) 23:59, 7 March 2017 (UTC)[reply]
- A 10-day category could be better than a
- Just noticed that I haven't updated the onwiki code, but if it fails the abuse check or has nobots it adds the |human=yes parameter. Dat GuyTalkContribs 16:49, 7 March 2017 (UTC)[reply]
- We might actually be better off with making the existing template also add a Category:Non-free files with orphaned versions more than 10 days old (or whatever other time), so A) the bot doesn't have to explicitly add it to each image, B) images where the bot throws (and so does nothing) don't just get lost and ignored, and C) if the bot stops running for whatever reason, the admins patrolling the leftover images see all images instead of none. —Cryptic 20:55, 5 March 2017 (UTC)[reply]
- Am not the best with templates, but I'll try to create the subcategory and add the param in the template. Please check it later if I've done it correctly. Dat GuyTalkContribs 10:46, 5 March 2017 (UTC)[reply]
- From my reading: this addresses most of the issues above, except for the user verification to prevent edit wars. I'd echo the above recommendation that anything the bot can't deal with should have a parameter added to its template so an admin can review it manually. Merely leaving them behind in the category will leave them hard to find especially if the reducer bot continues to run. — Train2104 (t • c) 01:07, 5 March 2017 (UTC)[reply]
In User:RonBot/FreeCategory, you separate category names with commas. This prevents people from adding new categories to the list if the name contains a comma. It is better to use a character which can't appear in page titles instead, such as "|". Also, to prevent abuse, the page should probably be protected. Note that one of the categories, Category:Media files requiring de-merge, is at CfD for proposed renaming, so remember to update the page if the category is renamed.
I think that the API query in the deletefile
function deletes revisions without specifying a reason in the deletion log. There are two different parameter names at mw:API:Revisiondelete: reason
in the list of all parameters and comment
in the example. What is the correct parameter name?
The function abusechecks
looks wrong. It looks as if the check will give false positives if at least two edits are made to the file information page after the latest upload. Use mw:API:Imageinfo to obtain information about file revisions instead of text revisions.
The implementation of {{bots}} and {{nobots}} is incomplete, but maybe sufficient. Also, different bots seem to implement the templates differently. The example code for Python at Template:Bots only checks the first template found if there are multiple on the page, while the example code for Java checks all templates.
You run the function startAllowed
to check if the bot is allowed to run, but immediately send the response to the garbage collector without using the information. If the bot is not allowed to run, then stop the bot instead of deleting old revisions.
Please check that the file appears in Category:All non-free media. It looks as if you currently don't do this.
Minor error: You are mixing local time with GMT time, although this causes little trouble in the UK where I imagine that the bot will run. datetime.datetime.now()
should be datetime.datetime.utcnow()
so that you use GMT time everywhere. Otherwise, an evil guy could provide an incorrect timestamp to get the bot to delete an old revision one hour too early during summer. --Stefan2 (talk) 23:59, 7 March 2017 (UTC)[reply]
Stefan2 Going through 1-by-1:
- Done
- Yep, added reason
- As said before, deleting adminbots should be more careful than normal bots. It will add a |human=yes parameter, see updated code.
- Added example code.
- return False actually stops the function, but Iv'e added something else
- Will take too long (not to code, but to execute)
- Done
Dat GuyTalkContribs 19:10, 8 March 2017 (UTC)[reply]
- If the bot can't find the template, it should list such files on a userspace log. Perhaps an non-standard redirect was used, perhaps the category was added directly. Why would checking for the non-free files category take too long if checking for free files is doable? — Train2104 (t • c) 19:48, 8 March 2017 (UTC)[reply]
- The bot checks for "free" images by looking for a template in the page that is in the list at User:RonBot/FreeCategory. The category Category:All non-free media is not present in the page description - it is transcluded from one of the very many templates that could be present in the page. Ronhjones (Talk) 17:41, 12 March 2017 (UTC)[reply]
- I'm not sure I understand. The bot checks that an image is free by looking to see if the page is in any of the categories at User:RonBot/FreeCategory. This category list is generated by mw:API:Categories, which appears to give both direct-inclusion and template-inclusion categories. Both Category:All non-free media and the various categories on the exception list are template-included. Therefore, it should be possible to check for the non-free category in the API return without making additional API calls. — Train2104 (t • c) 04:11, 16 March 2017 (UTC)[reply]
- I'll leave that one for User:DatGuy. Since the bot has already ascertained that it's not a free image, is there any point in more checks? Ronhjones (Talk) 23:09, 16 March 2017 (UTC)[reply]
- I'm not sure I understand. The bot checks that an image is free by looking to see if the page is in any of the categories at User:RonBot/FreeCategory. This category list is generated by mw:API:Categories, which appears to give both direct-inclusion and template-inclusion categories. Both Category:All non-free media and the various categories on the exception list are template-included. Therefore, it should be possible to check for the non-free category in the API return without making additional API calls. — Train2104 (t • c) 04:11, 16 March 2017 (UTC)[reply]
- The bot checks for "free" images by looking for a template in the page that is in the list at User:RonBot/FreeCategory. The category Category:All non-free media is not present in the page description - it is transcluded from one of the very many templates that could be present in the page. Ronhjones (Talk) 17:41, 12 March 2017 (UTC)[reply]
The bot should certainly be exclusion compliant. There are always subtleties that come up in the Image namespace. The exclusion method can be used temporarily, to "pause" a bot on a particular page until issues are figured out. — Carl (CBM · talk) 12:28, 17 March 2017 (UTC)[reply]
- That was fixed about 2 weeks ago. Ronhjones (Talk) 20:23, 17 March 2017 (UTC)[reply]
- @Ronhjones: Please skip over files that have a resolution of greater than 400 pixels in both dimensions. These images likely need resizing, and they often indicate that someone has uploaded a "high quality" version of a non-free image in a good-faith attempt to improve the wiki. This requires admin attention both to revert to the correct version of the image and to address the editor who uploaded a violation of our non-free content criteria. ~ Rob13Talk 13:20, 19 March 2017 (UTC)[reply]
- @BU Rob13: The problem is that 0.1Mpixels is just a guideline. You can't beat up people who claim they need bigger images. If you want all images to be less than 0.1Mpixels, then we need a policy to enforce it - I would add my name as a "yes". As it is I already have a large amount of images way over the guideline at Category:Non-free images tagged for no reduction, and I suspect that will grow. I would have thought that an edit filter (that puts up a warning when over 0.1MPixels) would be a far more efficient way to alert the uploader that there could be a problem, rather than waiting until it gets this far. In addition once we have gone through the 200,000 backlog of non-free images that have been ignored over the years, then finding new oversized images will be much easier. Ronhjones (Talk) 19:05, 20 March 2017 (UTC)[reply]
- Perhaps scope creep, could the bot in such cases perform the deletion anyway and at the same time post on a noticeboard to ask for further input? Jo-Jo Eumerus (talk, contributions) 19:43, 20 March 2017 (UTC)[reply]
- @Ronhjones: Well, two things. First, the guideline is based on legal considerations, and the legal considerations trump all; we cannot host content which wouldn't fly as fair use in the United States. Second, I'm not asking to strictly enforce this. I'm saying that human review is required. As mentioned above on the page, this bot should attempt to keep the false positive rate as low as possible, so it's preferable to allow an admin to directly review large images to determine if they violate the non-free content criteria. ~ Rob13Talk 20:06, 20 March 2017 (UTC)[reply]
- Mind the false negative rate too, though - there is no point in leaving so many exceptions that humans can't keep track of them. Also, the bot edit/deletion summaries should say that WP:REFUND is the place to contest a particular deletion. Jo-Jo Eumerus (talk, contributions) 20:15, 20 March 2017 (UTC)[reply]
- I agree with Jo-Jo Eumerus, the number of admins who regularly roam this area is very low. If it is important not to upload a bigger image, then I refer to my earlier suggestion of an edit filter, then we attack the problem at source and not sit around waiting for a least a week - in fact is someone is smart enough try to game the system and upload a bigger image, then why would they add the "reduced" template - they won't, if they do then they are plain dumb. Also as I said once we have the backlog cleared, with the exemptions to reduce being tagged up, then a simple search for filesres:>314 is going to find any "gamers" and get them reduced. I suspect there are already thousands of images that fall into this category anyway and we should be getting these long term violations sorted out ASAP. I also worry about processing time, we are at 5 sec per image (with all saves disabled) at present, extra api calls to get the sizes will slow things up more. If an edit filter is not practicable (although commons seems to have plenty of filters for images - some based on size - following the vast dodgy imports by Wikipedia Zero customers...), then why not another bot - it won't need to be an adminbot, one of the regulars could write it, to go through Category:Non-free files with orphaned versions (or even Category:Fair use images) looking for big files, and thus won't have to leave the big image showing for over a week. Ronhjones (Talk) 20:17, 21 March 2017 (UTC)[reply]
- @Ronhjones and Jo-Jo Eumerus: I've done about 1,000 of these over the past week, since it appears a bot went through and caught up on all the re-sizing that had been sitting there. So I have some decent anecdotal evidence on false negative. Let's assume that the 1k I've gone through was representative (can't think why it wouldn't be). If you only skipped .jpg, .png, .svg, and other image file types that I'm forgetting at the moment if they were >400 pixels in both dimensions, the total percent skipped would be something like 3–5%. Within that 5%, the false negative rate would be <5%. Likely lower, but I'm providing a conservative bound. So we're talking about 5% of 5% here. Most of the "too large" images do require admin attention, some urgently, as I've caught multiple cases where a single user uploaded high resolution versions of 50+ images. Once we get a bot doing most of these, I plan to handle these with regularity, so I'm not volunteering others to do this job. I'm happy to do it myself. As for processing time, I don't really find that too concerning. Having a bot running slightly slower but running well is more important than saving a second here and a second there. Needing another bot to handle an edge case is very undesirable, as is having a bot that performs unwanted actions if an admin doesn't check an edit filter report in over 7 days. ~ Rob13Talk 21:40, 21 March 2017 (UTC)[reply]
- I'll leave it to DatGuy to see if he can incorporate something. Ronhjones (Talk) 01:57, 22 March 2017 (UTC)[reply]
- @Ronhjones and Jo-Jo Eumerus: I've done about 1,000 of these over the past week, since it appears a bot went through and caught up on all the re-sizing that had been sitting there. So I have some decent anecdotal evidence on false negative. Let's assume that the 1k I've gone through was representative (can't think why it wouldn't be). If you only skipped .jpg, .png, .svg, and other image file types that I'm forgetting at the moment if they were >400 pixels in both dimensions, the total percent skipped would be something like 3–5%. Within that 5%, the false negative rate would be <5%. Likely lower, but I'm providing a conservative bound. So we're talking about 5% of 5% here. Most of the "too large" images do require admin attention, some urgently, as I've caught multiple cases where a single user uploaded high resolution versions of 50+ images. Once we get a bot doing most of these, I plan to handle these with regularity, so I'm not volunteering others to do this job. I'm happy to do it myself. As for processing time, I don't really find that too concerning. Having a bot running slightly slower but running well is more important than saving a second here and a second there. Needing another bot to handle an edge case is very undesirable, as is having a bot that performs unwanted actions if an admin doesn't check an edit filter report in over 7 days. ~ Rob13Talk 21:40, 21 March 2017 (UTC)[reply]
- I agree with Jo-Jo Eumerus, the number of admins who regularly roam this area is very low. If it is important not to upload a bigger image, then I refer to my earlier suggestion of an edit filter, then we attack the problem at source and not sit around waiting for a least a week - in fact is someone is smart enough try to game the system and upload a bigger image, then why would they add the "reduced" template - they won't, if they do then they are plain dumb. Also as I said once we have the backlog cleared, with the exemptions to reduce being tagged up, then a simple search for filesres:>314 is going to find any "gamers" and get them reduced. I suspect there are already thousands of images that fall into this category anyway and we should be getting these long term violations sorted out ASAP. I also worry about processing time, we are at 5 sec per image (with all saves disabled) at present, extra api calls to get the sizes will slow things up more. If an edit filter is not practicable (although commons seems to have plenty of filters for images - some based on size - following the vast dodgy imports by Wikipedia Zero customers...), then why not another bot - it won't need to be an adminbot, one of the regulars could write it, to go through Category:Non-free files with orphaned versions (or even Category:Fair use images) looking for big files, and thus won't have to leave the big image showing for over a week. Ronhjones (Talk) 20:17, 21 March 2017 (UTC)[reply]
- Mind the false negative rate too, though - there is no point in leaving so many exceptions that humans can't keep track of them. Also, the bot edit/deletion summaries should say that WP:REFUND is the place to contest a particular deletion. Jo-Jo Eumerus (talk, contributions) 20:15, 20 March 2017 (UTC)[reply]
- @Ronhjones: Well, two things. First, the guideline is based on legal considerations, and the legal considerations trump all; we cannot host content which wouldn't fly as fair use in the United States. Second, I'm not asking to strictly enforce this. I'm saying that human review is required. As mentioned above on the page, this bot should attempt to keep the false positive rate as low as possible, so it's preferable to allow an admin to directly review large images to determine if they violate the non-free content criteria. ~ Rob13Talk 20:06, 20 March 2017 (UTC)[reply]
- Perhaps scope creep, could the bot in such cases perform the deletion anyway and at the same time post on a noticeboard to ask for further input? Jo-Jo Eumerus (talk, contributions) 19:43, 20 March 2017 (UTC)[reply]
- @DatGuy: Are you actively working on this? If not, I'm going to request another bot operator to work on it, as this needs to get operational very soon. ~ Rob13Talk 01:29, 12 April 2017 (UTC)[reply]
- @BU Rob13: Will do on weekend. Dat GuyTalkContribs 15:04, 13 April 2017 (UTC)[reply]
- Just an update, is taking longer than expected. I was about to use the pyexiv2 module, but then I realised that it can't be run on labs since it's for Windows. I'll switch to use the imageinfo API query. Dat GuyTalkContribs 20:47, 19 April 2017 (UTC)[reply]
- @BU Rob13: The problem is that 0.1Mpixels is just a guideline. You can't beat up people who claim they need bigger images. If you want all images to be less than 0.1Mpixels, then we need a policy to enforce it - I would add my name as a "yes". As it is I already have a large amount of images way over the guideline at Category:Non-free images tagged for no reduction, and I suspect that will grow. I would have thought that an edit filter (that puts up a warning when over 0.1MPixels) would be a far more efficient way to alert the uploader that there could be a problem, rather than waiting until it gets this far. In addition once we have gone through the 200,000 backlog of non-free images that have been ignored over the years, then finding new oversized images will be much easier. Ronhjones (Talk) 19:05, 20 March 2017 (UTC)[reply]
- @DatGuy and BU Rob13: As shown that code runs without much error - see User:RonBot/DummyRun2. Points to note.
- About 4-5 seconds between entries, and, of course, when the save steps are added that will be longer.
- When run under Admin account - it finds all the images (including the hidden ones) - see entries for "File:AnyVan Logo.png" - First run = 3 entries, as there are 3 to revdel. Second run (after I manually revdeled one of them) = 3 entries. Third run (under non-admin account) = 2 entries. So when running under an admin account, it will count all the images to be hidden including the images deleted. How it will run will depend on how Wikipedia reacts to a revdev of a revdeled image - it might just ignore it.
- Not sure what was up with "File:Hopeful - Bars and Melody.jpg" - never got a Done message, just 'module' object has no attribute 'unprefixedtitle' twice.
- Just shout if you want more runs. Ronhjones (Talk) 22:09, 23 April 2017 (UTC)[reply]
- I'm a bit confused at this point due to the meandering function. Either DatGuy or Ronhjones, can you provide a complete description of what the bot does/doesn't do as of this point in time (all skip criteria, what happens when those activate, etc)? Thanks. ~ Rob13Talk 22:12, 23 April 2017 (UTC)[reply]
- Just looked at File:Hopeful - Bars and Melody.jpg - the new image is too big...! Ronhjones (Talk) 22:13, 23 April 2017 (UTC)[reply]
- @BU Rob13: Thought Ronhjones would reply. Anyways, 1 by 1.
- I'm a bit confused at this point due to the meandering function. Either DatGuy or Ronhjones, can you provide a complete description of what the bot does/doesn't do as of this point in time (all skip criteria, what happens when those activate, etc)? Thanks. ~ Rob13Talk 22:12, 23 April 2017 (UTC)[reply]
- @DatGuy and BU Rob13: As shown that code runs without much error - see User:RonBot/DummyRun2. Points to note.
- Bot finds all pages in Category:Non-free files with orphaned versions more than 7 days old
- Bot skips Category:Non-free files with orphaned versions more than 7 days old needing human review, since that used to cause errors
- The bot finds the version to delete
- The bot checks if it's enabled. If not, it stops and waits for the next run.
- If any of the categories at User:RonBot/FreeCategory are on the page, the file is skipped
- The bot checks if DatBot resized the file 2 edits ago and 7 or more days ago
- If it fails the check, it requests manual review
- The bot checks if both dimensions of the file are less than 400px
- If it fails the check, it requests manual review
- The bot deletes the revision
- The bot removes the reduced tag
- Back up we go
- Hopefully that explains it. Dat GuyTalkContribs 15:26, 27 April 2017 (UTC)[reply]
- In case it affects your programming, please note that Category:Media files requiring de-merge has just been renamed to Category:Wikipedia files requiring splitting. – Fayenatic London 22:46, 27 April 2017 (UTC)[reply]
- At this point, I support a small trial (50 edits - note that each admin action is accompanied by an edit, so this gives us 50 total files worked on). We now need observable actions/edits to find any issues and generate new feedback. I have experience working on this maintenance category, so I'm happy to volunteer to go through trial results and ensure all bot admin actions were appropriate, meaning minimal disruption even if something goes wrong. ~ Rob13Talk 16:20, 28 April 2017 (UTC)[reply]
- I'm happy to run a trial if approved. I would suggest that we create a temporary category Category:RonBotTest, randomly add 50 files from Category:Non-free files with orphaned versions more than 7 days old and change line 20 of User:RonBot/Source1 to point to the temporary category - then we know it will only process up to the 50 files, and they are nicely all together in one place to aid evaluation of the run. I assume we do test runs with my normal admin account? Ronhjones (Talk) 16:29, 29 April 2017 (UTC)[reply]
- {{BAGAssistanceNeeded}} Can we get approval for the trial proposed by User:BU Rob13? Ronhjones (Talk) 11:44, 5 May 2017 (UTC)[reply]
- The current backlog is borderline unworkable (808?!), is mostly brainless work, and should be done by a bot. This looks like a great task. @Xaosflux: any updates? -FASTILY 08:20, 14 May 2017 (UTC)[reply]
- @Ronhjones: as an adminbot, what type of authentication will this account be using? If using anything other basic authentication - why? What authentication controls will you be adding? Where will it operate from (e.g. tool server, your computer, a server of yours). — xaosflux Talk 15:05, 14 May 2017 (UTC)[reply]
@Xaosflux: Although the bot is epected to be busy for quite a while, it will settle to a low value, so I planed to just run it overnight on my PC. When I started tagging large non-free images (when mw:Help:CirrusSearch#File_properties_search was enabled last year, and when User:DatBot6 was enabled to replace the ailing Theo Bot), there were some 260,000 non free image greater than 100,000 pixels, now there are 189,000. At best I can (and have been) manually tag 4000 a week, so we have a year of a busy bot - then it should just be a mere trickle of usage as new files get uploaded. Currently it has a username and a very long, strong random password. Ronhjones (Talk) 15:53, 14 May 2017 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (+50 RevDel's). Trial approved, use your existing admin account. If you are currently using Two-factor authentication for your primary account, you can create a Special:BotPasswords (A 'simple' guide is available here: Wikipedia:Using_AWB_with_2FA - when done trialing you can delete that BotPassword.) I strongly recommend 2FA for admins, and for your bot - you can either have it use a BotPassword as well or an OAuth consumer token. Both of these will allow you to grant your bot only the admin permissions it actually needs to do its task. — xaosflux Talk 16:22, 14 May 2017 (UTC)[reply]
Trial complete. @DatGuy, Fastily, BU Rob13, Stefan2, and Xaosflux:
- Ran a few trials (one at a time at first), using Category:RonBotTest to be the source of the bot's list. Source cose was User:RonBot/Source2 - which has more write staements than code. I could not find any "bad" images, so I had to force the issue on a few files. There are 45 files there, with slightly over 50 actual revdels - due to multiples.
- No version 1 is without it's problems, and no exception here, but were not too difficult to fix...
- There was a write statement in addmanual which printed a parameter it didn't have - commented out.
- Bot failed to revdel anything - eventually found that the bot was passing the whole archive parameter to the api, whereas it just wanted the number. DatGuy added a RegEx to extract the number and it started working.
- Bot tried revdev a revdel'd image - a hidden image comes back with two parameters - the "filehidden" and the "archivename" - program removed first and not second. A second "del" statement deleted both parameters and thus no api call was made.
- When the Bot needed to add "|human" it added "|human}}" - so the number of closing braces was incorrect, also it did it for each version. Both issues easily fixed.
- All images used in trial are still in Category:RonBotTest for examination. Images to note are..
- File:Anantha Jothi Poster .jpg - contains Category:Items pending OTRS confirmation of permission by date, and was skipped by the bot
- File:Ant-Man (Scott Lang).jpg - contains Category:All free media, and was skipped by the bot
- File:André Maginot DSCF0734.JPG - contains Category:Media files requiring de-merge, and was skipped by the bot (I see the category has changed name, so list will need editing - I missed the comment up above).
- File:Logo Politecnico Milano.png - had one image to revdel and one already done, All OK
- File:Harrods Estates logo.jpg - multiple images to revdel - all OK
- File:InTheMiddleCD2Cover.jpg - reverted (by me) to big image - skipped and put into Category:Non-free files with orphaned versions more than 7 days old needing human review
- File:New Hampshire Union Leader newspaper cover.jpg- reverted (by me) to big image - skipped and put into Category:Non-free files with orphaned versions more than 7 days old needing human review
- Rest of the images are just "normal" requests and are OK. I would suggest another small trial: This time, starting from the proper category Category:Non-free files with orphaned versions more than 7 days old as the bot source (just to ensure a change brings no issues), and I'll hit Ctrl-Break when about the correct number have scrolled past. Ronhjones (Talk) 15:35, 20 May 2017 (UTC)[reply]
- For the next trial, just set the limit in the first function (the one that looks for files in the category) to the number for the trial. Dat GuyTalkContribs 15:38, 20 May 2017 (UTC)[reply]
- @Ronhjones: I'll look through the trial more comprehensively in the next few days, but I want to emphasize that restoring high resolution versions of non-free images for the purposes of a test pose substantial legal issues, both for the encyclopedia and potentially for you as an individual. Such large versions of images almost certainly do not fall under fair use under US copyright law, so a copyright holder could file a DMCA takedown notice with the WMF or even sue the contributor who restored the copyright violation individually. Reverting an image to a larger version for a test (e.g. File:InTheMiddleCD2Cover.jpg) shouldn't be done, and if it is done, it should be undone very quickly. Instead, feel free to upload a high resolution free image, clearly mark it as part of a bot trial, and slap necessary tags/categories on it temporarily to test how the bot handles a specific use case. Temporarily having a free image marked as non-free is better than restoring copyright violations for any length of time. ~ Rob13Talk 16:27, 20 May 2017 (UTC)[reply]
- Point noted Rob. Hopefully we might find some real villains in the next trial. Ronhjones (Talk) 23:14, 20 May 2017 (UTC)[reply]
- @Ronhjones: I'll look through the trial more comprehensively in the next few days, but I want to emphasize that restoring high resolution versions of non-free images for the purposes of a test pose substantial legal issues, both for the encyclopedia and potentially for you as an individual. Such large versions of images almost certainly do not fall under fair use under US copyright law, so a copyright holder could file a DMCA takedown notice with the WMF or even sue the contributor who restored the copyright violation individually. Reverting an image to a larger version for a test (e.g. File:InTheMiddleCD2Cover.jpg) shouldn't be done, and if it is done, it should be undone very quickly. Instead, feel free to upload a high resolution free image, clearly mark it as part of a bot trial, and slap necessary tags/categories on it temporarily to test how the bot handles a specific use case. Temporarily having a free image marked as non-free is better than restoring copyright violations for any length of time. ~ Rob13Talk 16:27, 20 May 2017 (UTC)[reply]
- For the next trial, just set the limit in the first function (the one that looks for files in the category) to the number for the trial. Dat GuyTalkContribs 15:38, 20 May 2017 (UTC)[reply]
@Ronhjones: Alright. I've finally circled back around to this. A few questions/notes:
- Your bot is leaving behind a lot of white space (e.g. here, but visible in all edits). Your edits shouldn't leave behind those extra newlines.
- I assume both File:National Basketball Federation Kazakhstan.png and File:New Hampshire Union Leader newspaper cover.jpg were affected by the early issues that you've said you fixed. Is that correct?
- What caused the failure at File:Logo Politecnico Milano.png the first time around?
Please ping me in your response so I can take a look. ~ Rob13Talk 04:38, 3 June 2017 (UTC)[reply]
- I've fixed the first error at User:RonBot/Source2. Dat GuyTalkContribs 10:56, 3 June 2017 (UTC)[reply]
- @BU Rob13: Answers.
- File:New Hampshire Union Leader newspaper cover.jpg - oops bad cut and paste in table above - that is No.7 (now fixed), it did what bot was supposed to do - no action and place in cat for manual review.
- File:National Basketball Federation Kazakhstan.png and File:Logo Politecnico Milano.png - That were the very first tests on 16th May / 17th May - just a few files on trial - sorting out the errors in the code. The main run was on the 18th.
- Hope that makes sense. Ron. Ronhjones (Talk) 14:18, 3 June 2017 (UTC)[reply]
- Yes, looks good. @Xaosflux: You approved the first trial, so I'll leave this to you, but I can say that (minus the whitespace issue, now fixed) everything in these trail edits looks as it would after I processed the orphaned non-free revisions myself. Probably best to have a prolonged trial both to test for further corner cases and so the community can take note and comment if they desire. ~ Rob13Talk 16:16, 3 June 2017 (UTC)[reply]
- @BU Rob13, Xaosflux, and DatGuy:I did another dummy run (no saves) while we are waiting, just monitoring the bot's printouts. 559 files passed all the criteria and 18 did not...
- (A) Contain "Di-orphaned fair use"
- File:2017WorldSeries.jpg
- File:Mark_Driscoll,_High_School_Yearbook_Photo.jpg
- File:NA_LCS_logo.jpeg
- (B) Edited after reduced banner added
- File:Wanted_2011.jpg
- File:War_for_the_Planet_of_the_Apes_poster.jpg
- File:Warlock metal racer.jpg
- File:WCP-Uniform-EDM.png
- File:We_Are_Family_Foundation_Logo.jpg
- File:Western_Australia_Police_logo.png
- File:What_Car?_magazine_July_2001.jpg
- File:Windows_3.11_workspace.png
- File:Windows_95_at_first_run.png
- File:WKLB-FM_logo.png
- (C) Image uploaded is too big
- File:Winnipeg_Jets_Logo_2011.svg
- File:WinX MediaTrans Screenshot.png
- File:Wmdcc2017.png
- File:Woolworths logo 2012.svg
- (D) Other
- File:War_Machine_(film).jpg - DatBot did not reduce - "Removing {{Non-free reduce}} since File is already adequately reduced"
- All looks fine Ronhjones (Talk) 21:46, 7 June 2017 (UTC)[reply]
- @BU Rob13, Xaosflux, and DatGuy:I did another dummy run (no saves) while we are waiting, just monitoring the bot's printouts. 559 files passed all the criteria and 18 did not...
- Yes, looks good. @Xaosflux: You approved the first trial, so I'll leave this to you, but I can say that (minus the whitespace issue, now fixed) everything in these trail edits looks as it would after I processed the orphaned non-free revisions myself. Probably best to have a prolonged trial both to test for further corner cases and so the community can take note and comment if they desire. ~ Rob13Talk 16:16, 3 June 2017 (UTC)[reply]
- I've fixed the first error at User:RonBot/Source2. Dat GuyTalkContribs 10:56, 3 June 2017 (UTC)[reply]
- Approved for extended trial (1000 edits or 14 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (with associated revdels). The bot appears technically sound. This is mostly to further probe for corner cases and to allow community comment. Because I would like the community to take note of the bot and potentially comment, this trial should run from your bot account. I will request the admin flag for your bot at WP:BN with an expiry time of 2 weeks. ~ Rob13Talk 14:35, 12 June 2017 (UTC)[reply]
- @Ronhjones: temporary flags have been enabled to support this trial. These will also allow you to set up 2FA authentication for this account. See the section above about using OAUTH or BotPasswords. — xaosflux Talk 14:47, 12 June 2017 (UTC)[reply]
Trial complete. @DatGuy and BU Rob13: There were some initial issues...
- Blank line(s) left when template in top row - now fixed
- Sometimes Wikipedia puts the file in the 7 day cat a little early, changed code to 6 days and 22 hours to fix
- A random error when bot remove one version only - seemed to be caused by code abuse checking every version - changed to abuse check at the start of a file run, no problem since.
- And a fair crop of human review needed, they can all be explained
See User:RonBot/Trial2 for more information, User:RonBot/Source2 is the latest version run. Ronhjones (Talk) 22:58, 19 June 2017 (UTC)[reply]
- @Ronhjones: Could you explain what you meant by the 7 day thing? Not sure I follow what the issue was and what you changed. ~ Rob13Talk 00:40, 21 June 2017 (UTC)[reply]
- @BU Rob13: Files when reduced are added to Category:Non-free files with orphaned versions, after 7 days they appear in Category:Non-free files with orphaned versions more than 7 days old. Not sure why, but it's not always 7 full days (i.e. 168 hours), sometimes they get into that category an hour or so early, and the bot rejected it as exactly 7 days was not up - so I changed it to 6days and 22hours. Ronhjones (Talk) 16:36, 21 June 2017 (UTC)[reply]
- @Ronhjones: Alright, makes sense (although I can't find why that would be the case - the underlying date template updates after the exact number of seconds corresponding to 7 days). I'll leave this open for a few days for any community comment. ~ Rob13Talk 18:40, 21 June 2017 (UTC)[reply]
- @BU Rob13: Quick example - File:Angst in My Pants - Sparks.jpg - new version 18:02, 10 June 2017, RonBot processed it at 17:56, 17 June 2017 - 6 minutes short of the 7 days. NB: Seeing that reminds me of another issue fixed - files in the manual review cat are still in the parent, so it got processed again. Added the "manual review" template to the list of templates to skip. Ronhjones (Talk) 19:51, 21 June 2017 (UTC)[reply]
- Explanation: At 2 PM, someone tags the file with {{subst:orfurrev}}. This inserts the date (for example,
21 June 2017
) but doesn't include the time. The template therefore thinks that the file was tagged on the specified date at midnight, so the file moves to the 7-days-old category at midnight on 28 June 2017. If correctly tagged with a timestamp, the file wouldn't be moved to the 7-days-old category until 2 PM on 28 June 2017 (that is, 14 hours later). This is unwanted template behaviour in my opinion: the file is not to be moved to the other category before 2 PM. --Stefan2 (talk) 23:23, 21 June 2017 (UTC)[reply]- I've rectified this by changing the wrapper in the template to subst {{REVISIONTIMESTAMP}} instead of {{Date}}. – Train2104 (t • c) 00:39, 22 June 2017 (UTC)[reply]
- Ah, yes, that explains it. @Ronhjones: Could you roll back this fix? I thought REVISIONTIMESTAMP was already being used, and now that it is, the issue is fixed on the template side of things. ~ Rob13Talk 01:28, 22 June 2017 (UTC)[reply]
- @BU Rob13: Nice to find a reason for the behaviour. PC code changed back to 7 days, as reflected in User:RonBot/Source2 Ronhjones (Talk) 18:26, 22 June 2017 (UTC)[reply]
- Ah, yes, that explains it. @Ronhjones: Could you roll back this fix? I thought REVISIONTIMESTAMP was already being used, and now that it is, the issue is fixed on the template side of things. ~ Rob13Talk 01:28, 22 June 2017 (UTC)[reply]
- I've rectified this by changing the wrapper in the template to subst {{REVISIONTIMESTAMP}} instead of {{Date}}. – Train2104 (t • c) 00:39, 22 June 2017 (UTC)[reply]
- Explanation: At 2 PM, someone tags the file with {{subst:orfurrev}}. This inserts the date (for example,
- @BU Rob13: Quick example - File:Angst in My Pants - Sparks.jpg - new version 18:02, 10 June 2017, RonBot processed it at 17:56, 17 June 2017 - 6 minutes short of the 7 days. NB: Seeing that reminds me of another issue fixed - files in the manual review cat are still in the parent, so it got processed again. Added the "manual review" template to the list of templates to skip. Ronhjones (Talk) 19:51, 21 June 2017 (UTC)[reply]
- @Ronhjones: Alright, makes sense (although I can't find why that would be the case - the underlying date template updates after the exact number of seconds corresponding to 7 days). I'll leave this open for a few days for any community comment. ~ Rob13Talk 18:40, 21 June 2017 (UTC)[reply]
- Approved. @Ronhjones: Because this is an admin bot, I expect you to act with an abundance of caution to any future bug reports. If an erroneous admin action occurs, you shouldn't run the bot until you've quashed the bug. ~ Rob13Talk 21:55, 3 July 2017 (UTC)[reply]
- @BU Rob13: Files when reduced are added to Category:Non-free files with orphaned versions, after 7 days they appear in Category:Non-free files with orphaned versions more than 7 days old. Not sure why, but it's not always 7 full days (i.e. 168 hours), sometimes they get into that category an hour or so early, and the bot rejected it as exactly 7 days was not up - so I changed it to 6days and 22hours. Ronhjones (Talk) 16:36, 21 June 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.