Wikipedia:Bots/Requests for approval/FairuseBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Automatic or Manually Assisted: Automatic
Programming Language(s): Perl
Function Summary: Enforcement of Wikipedia:NFCC 7 (orphaned fairuse images) and 10c (fairuse images that don't mention the article in the rationale)
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Edit rate requested: 6 edits per minute
Already has a bot flag (Y/N): No
Function Details:
Enforcement of NFCC 7: If an image is marked as non-free, but isn't used in any article, the bot will add a {{Di-orphaned fair use}} tag to it and notify the uploader.
Enforcement of NFCC 10c:
- If an image is marked as non-free, there are no links from the image description page to any of the pages it is used in, and none of the pages it is used in is mentioned in non-link text, the bot will add a {{Di-missing article links}} tag to the image and notify the uploader.
- If an image is marked as non-free, and there are links or mentions of some, but not all, of the pages it is used in, the bot will add a {{Di-missing article links}} tag to the image and notify the uploader. After six days, it will remove the image from any articles that still don't have mentions or links, and remove the tag.
There are two major differences from what BetacommandBot does:
- If it can reach a page where the image is used by redirects or disambiguation pages, it will update the link in question rather than flag the image as non-compliant.
- It's operated by a user who's more willing to help people who have questions.
Discussion
[edit]For NFCC7: Are you tagging images with no uses or with no mainspace uses?
For NFCC10c: Will the bot follow redirects? What if, for example, a non-free image is used on User:ST47 and there's a link there due to my sig in the image? Will the bot tag it? --uǝʌǝsʎʇɹoɟʇs(st47) 10:36, 19 March 2008 (UTC)[reply]
- There's no need to compare your bot to BetacommandBot, nor comment on betacommand in this BRFA. Let's look at your bot instead, ok? SQLQuery me! 12:39, 19 March 2008 (UTC)[reply]
- What specific templates will you be using to notify uploaders? After six days, when it removes the tag, if the image is nonfree and orphaned, would it mark it for deletion? SQLQuery me! 12:44, 19 March 2008 (UTC)[reply]
- Is the code for this bot available? I am involved with several Perl bots and I would be interested to see how yours is implemented. Even if you're only willing to share the code privately, I would be interested to read it. — Carl (CBM · talk) 15:05, 19 March 2008 (UTC)[reply]
- Are you planning to implement some sort of exclusion list so that users who don't want to be contacted can opt out? — Carl (CBM · talk) 15:05, 19 March 2008 (UTC)[reply]
- The bot will be NFCC7-tagging images with no mainspace uses.
- ST47, I'm not sure what you're trying to get at. The bot doesn't work outside the mainspace: if you've put the image on your userpage, the bot won't see that. The image will either be treated as an orphaned non-free image, or a non-free image with no article links.
- I'll be using User:OrphanBot/nfcc10c and User:OrphanBot/orfud to notify users.
- I'm not sure of the details of tag removal, as I haven't written that part yet. In theory, the case you mention can't happen: if the bot's going to remove the image from articles, that means the image is used in at least one article with a valid fair-use rationale.
- Source code should be available later today, when I finish writing the bot.
- Right now, it uses the same opt-out list as my other bots, and users can be added to it on request. If another standard is established, I'll modify the bot to work with that.
- --Carnildo (talk) 20:16, 19 March 2008 (UTC)[reply]
- I was assuming that you were looking at an image's File Links, and was wondering whether you were checking that the links there were mainspace. Where are you getting the image list? --uǝʌǝsʎʇɹoɟʇs(st47) 21:56, 19 March 2008 (UTC)[reply]
- I'm using api.php, which lets you filter by namespace. --Carnildo (talk) 22:38, 19 March 2008 (UTC)[reply]
- OK. Once you've finished your programming, Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --uǝʌǝsʎʇɹoɟʇs(st47) 23:04, 19 March 2008 (UTC)[reply]
- I'm using api.php, which lets you filter by namespace. --Carnildo (talk) 22:38, 19 March 2008 (UTC)[reply]
- I was assuming that you were looking at an image's File Links, and was wondering whether you were checking that the links there were mainspace. Where are you getting the image list? --uǝʌǝsʎʇɹoɟʇs(st47) 21:56, 19 March 2008 (UTC)[reply]
- Trial completed. I've discovered a situation where the bot will, by following the rules, do exactly the wrong thing:
- It's not uncommon for album cover images to list the name of the band (but not the name of the album) and have a generic no-articles-named fair-use rationale. If the cover is used in both the band's article and the album's article (again, not uncommon), the bot will mark the image as non-compliant for the album, and six days later, will remove the image from the album, leaving it in the band's article. According to the rules, this is the right thing to do, but it's exactly the opposite of what is desirable from a fair-use perspective. Is it worth doing something about, and if so, what? --Carnildo (talk) 03:35, 28 March 2008 (UTC)[reply]
- Code is available at User:FairuseBot/10cbot.pl, User:FairuseBot/10c-removal.pl, User:FairuseBot/Pearle.pm, User:FairuseBot/Pearle/WikiPage.pm and User:FairuseBot/libBot.pm. --Carnildo (talk) 03:51, 28 March 2008 (UTC)[reply]
- Thanks! I like the idea of escaping some of the wikitext syntax, like double brackets, to simplify the later regexes somewhat. I'm not a BAG member, and in the end it's your code, but here are a few other things I noticed when I looked through the code:
- It looks like APIquery() doesn't UTF-8 encode the query parameters, which could cause you issues since you pass image titles into API queries. It's something to look out for, depending whether you are encoding them somewhere else. But if they are encoded, does that make the regex matching fail? Make sure you test the code on some images with esoteric unicode titles.
- The Pearle.pm library doesn't use the maxlag system. You could get much higher overall throughput if you used maxlag, at the cost of some coding time.
- The code queries each image page's wikitext separately. One of the benefits of the API is that you can make batch queries that fetch the contents of many pages simultaneously, lightening the server load by reducing the number of HTTP queries you make. This would also increase the throughput of the bot.
- — Carl (CBM · talk) 14:53, 28 March 2008 (UTC)[reply]
- I'll look into the encoding bit. It could explain some of the unexpected behavior that I'm seeing from ImageRemovalBot.
- Pearle.pm predates both the maxlag system and api.php. The current rate-limiting system works well enough, so I haven't bothered to update it.
- Code design is a matter of balancing throughput with memory usage. The deployment system for this bot only has 32MB of available RAM, which it will be sharing with three other bots.
- --Carnildo (talk) 18:58, 28 March 2008 (UTC)[reply]
- Thanks! I like the idea of escaping some of the wikitext syntax, like double brackets, to simplify the later regexes somewhat. I'm not a BAG member, and in the end it's your code, but here are a few other things I noticed when I looked through the code:
- Running in 32MB is an impressive feat. That probably explains why you keep the list of images on disk. — Carl (CBM · talk) 20:07, 28 March 2008 (UTC)[reply]
- Yes, the list alone would use up more than 40 MB of RAM. Thanks for pointing out the encoding bit -- it turns out the bot wasn't handling the UTF-8 data from the image list file properly, which wasn't a problem for APIQuery() as it didn't encode the parameters, but was a problem for getPage() and postPage(), which did encode. --Carnildo (talk) 05:40, 29 March 2008 (UTC)[reply]
- Running in 32MB is an impressive feat. That probably explains why you keep the list of images on disk. — Carl (CBM · talk) 20:07, 28 March 2008 (UTC)[reply]
Listing articles
[edit]I am a little concerned about listing the articles for which no rationale is written. Take this example. At the time your bot tagged it, it was used in Linda Bengtzing (it has since been removed, but this explains why your bot acted the way it did). However, because the bot lists the articles for which there is no rationale, it essentially protects the image from all other NFCC10c tagging bots. Just wanted to point this out (not necessarily a big deal).
Also, it would be helpful for all of us who automate the removal of such bot tags when they are fixed (I am thinking of my userscript Wikipedia:FURME), if this list (if it is going to be used) were to go inside a template or some identifiable box (even just a div with an id or something). As it is now, it poses some difficulty to do absolutely reliably. - AWeenieMan (talk) 03:23, 28 March 2008 (UTC)[reply]
- Not a big deal. The bot keeps a record of which images it's added a tag-and-list to, and six days later, will remove both the tag and the list. It then checks to see if the image is out of compliance for those articles, and if so, it removes the image. You can see this in action in the bot's most recent edits. --Carnildo (talk) 03:30, 28 March 2008 (UTC)[reply]
- Well, that addresses the first point (it just protects them for 6 days). What happens if someone removes the template, but not the list (presumably they fixed the image, but didn't see the list). How does the bot react to changes made to its addition? Also, to my second point, as the template you add puts images in Category:All disputed non-free images, the template really should be (and in most cases will be) removed when the image is fixed (to get it off maintenance lists and such). All I am thinking is that a fairly minor change to the template (it should be easy enough to add a parameter to post the list inside {{Di-missing article links}}) or to how the bot posts the lists could facilitate this. - AWeenieMan (talk) 04:31, 28 March 2008 (UTC)[reply]
- The template is for the benefit of the users, not for the bot. Is it possible for a template to display an arbitrary-length list of parameters? --Carnildo (talk) 04:58, 28 March 2008 (UTC)[reply]
- I do not believe you can have an arbitrary list of parameters. However, you only really need one parameter that can take in a list. Here is what I was thinking. Examples here. - AWeenieMan (talk) 15:11, 28 March 2008 (UTC)[reply]
- Looks good. It'll also simplify the design of the bot's tag-removal function. --Carnildo (talk) 18:58, 28 March 2008 (UTC)[reply]
[1] Doesn't look good. — Werdna talk 13:50, 4 April 2008 (UTC)[reply]
- What's not good about it? The syntax the bot left behind is a little screwy, but it doesn't seem to be causing any problems. --Carnildo (talk) 19:47, 4 April 2008 (UTC)[reply]
It should have left a blank, not one of the image's parameters. — Werdna talk 06:31, 8 April 2008 (UTC)[reply]
- I've done some testing, and it's not possible to fix it without causing even worse problems: there's no way to tell if it's an image modifier or an anonymous template parameter, and I think MediaWiki interprets it as the latter. --Carnildo (talk) 20:39, 8 April 2008 (UTC)[reply]
Okay. Any other objections to full approval? — Werdna talk 07:18, 10 April 2008 (UTC)[reply]
- None from me. SQLQuery me! 13:36, 11 April 2008 (UTC)[reply]
Approved. — Werdna talk 00:20, 12 April 2008 (UTC)[reply]
NFCC#7
[edit]Just as a heads up, BJBot already does this task and daily brings the fair use orphan count to zero. After the BRFA passes it will also be removing the tags. BJTalk 06:37, 8 April 2008 (UTC)[reply]
- Is there anything I need to do to coordinate with it? --Carnildo (talk) 07:55, 8 April 2008 (UTC)[reply]
- I'm not sure how your version is going to run. Mine gets all orphans from an SQL query daily, then tags them. BJTalk 18:01, 8 April 2008 (UTC)[reply]
- Enforcing NFCC#7 is mostly a side-effect of FairuseBot's operation: since it's already inspecting every non-free image and checking usage, checking NFCC#7 compliance is literally a single extra line of code. As long as your bot can handle someone else tagging an image in the time between when the SQL query is run and the bot applies the tag, things should be fine. --Carnildo (talk) 20:39, 8 April 2008 (UTC)[reply]
- Ah, I see. It won't place the tag if it already there so I see no problems. BJTalk 20:43, 8 April 2008 (UTC)[reply]
- Enforcing NFCC#7 is mostly a side-effect of FairuseBot's operation: since it's already inspecting every non-free image and checking usage, checking NFCC#7 compliance is literally a single extra line of code. As long as your bot can handle someone else tagging an image in the time between when the SQL query is run and the bot applies the tag, things should be fine. --Carnildo (talk) 20:39, 8 April 2008 (UTC)[reply]
- I'm not sure how your version is going to run. Mine gets all orphans from an SQL query daily, then tags them. BJTalk 18:01, 8 April 2008 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.