Wikipedia talk:Bots/Archive 16

This is an archive of past discussions about Wikipedia:Bots. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 10

←

Archive 14

→

New behavior for OrphanBot

Since the situation with unsourced images is now somewhat under control, I've modified OrphanBot's behavior:

OrphanBot will now convert tags from the {{nosource}} and {{nosource|~~~~~}} formats to the {{nosource|day=|month=|year=}} format, which categorizes the images by date, making it easier to see which images need deleting. OrphanBot will also create these categories if needed.
OrphanBot will notify uploaders about unsourced images. Notification will only be given if OrphanBot can't find a link to the image on the user's talk page: if it finds a link, there's a good chance that the user has already been notified.
OrphanBot will only orphan images that are older than six days. Hopefully this will get User:Sam Spade off my back -- he only wants me to remove image after they've been deleted.

--Carnildo 07:25, 14 January 2006 (UTC)

How is it going to do that, is it going to use, {{CURRENTDAY}} {{CURRENTMONTH}} {{CURRENTYEAR}} to just add that in? Jtkiefer^{T | C | @} ---- 07:48, 14 January 2006 (UTC)

Going to do what? --Carnildo 08:38, 14 January 2006 (UTC)

How do you decide on the absence of a source? A bit odd the notice I got on my talk page. -- User:Docu

I don't. I let someone else tag it as {{nosource}}, then OrphanBot notifies the uploader. --Carnildo 08:19, 10 March 2006 (UTC)

The notice it puts is misleading. I keep getting it because someone mislabeled images. Please fix the notice. I blocked the bot for now. -- User:Docu

AlvestrandNLBot

This is a project in the making, not an existing bot... I created a couple of lines of Perl that allowed me to pull up a subsection of Category:Living people in a format that would allow me to cut-and-paste it into List of people by name. This led to some discussion, which included the points that:

This would be much more useful if it also read the article pointed to and attempted to pull some basic biographical data from there
Letting the script do the paste-back to Wikipedia itself would make some issues (particularly Unicode ones) easier
If the two changes above were made, the result would most certainly need permission to run under the Wikipedia bot policy.

So I'm hereby asking permission. The account for the bot is not created, the script is not written. But I would like to make sure I'm not getting ahead of myself. --Alvestrand 13:31, 15 February 2006 (UTC)

We'd need to see the code before we can determine if it's ok or not, post the code on your bots usertalk page and we'll review it. Tawker 13:45, 15 February 2006 (UTC)

Looking bot to retrieve wikilinks

Hello, I'm wondering if anyone can point me out where to get the answer. My question is that if I have a specific category, I would like to get all red wikilinks in all articles in that category. Do you know any bot that can do this thing? Thanks for any answer --Manop - TH 15:57, 15 February 2006 (UTC)

I made a program a while ago, linked to on my userpage, that lists all the red links on a given article, that may help. Martin 16:07, 15 February 2006 (UTC)

Flag bot

I haven't had time to read all of this, but I was looking to run a bot that replaced flag images. Is there such a bot already? I could even help it if it exists... F e tofs ^Hello! 22:33, 18 February 2006 (UTC)

What do you mean by replaced flag images? Do you need a straight text to text (imagex) in lieu of (imagey) or something more complex. If the forementioned is what you need, I can set user:tawkerbot to do it as soon as I have a consensus to do it. Tawker 09:12, 19 February 2006 (UTC)

Actually, I wanted to get the bot myself, because there are a lot of flags on the .PNG format. But yes, it's a straight find and replace. I'll make some edits with it, I hope it clarifies things. Running in a week trial. When the flag things are done, I'll use it for general corrections. F e tofs ^Hello! 11:14, 19 February 2006 (UTC)

P.S: Can I use it to solve disambiguation links as well?

Does policy apply to tools, or to edit frequency?

A clarification: if someone is using their browser to make fast edits, either with keyboard shortcuts or perhaps using a JavaScript tool or something like Greasemonkey, does that exempt them from this policy? I have noted a few cases of late where editors are making 10 edits per minute or more for extended periods, and when queried, they say they are not running a bot, but rather using browser features. Is this policy scoped to the tool in use, or the frequency of edits, or to something else? --TreyHarris 10:31, 21 February 2006 (UTC)

If it looks like a bot, smells like a bot, acts like a bot, it's a bot. Talrias (t | e | c) 10:43, 21 February 2006 (UTC)

A similiar question was asked a while back (here) and the answer concurred with Talrias.--Commander Keane 11:22, 21 February 2006 (UTC)

Acutally the answer was "Use a bot flag to avoid clogging up recent changes" (paraphrase). Recent changes also has a "minor changes" flag, and a "logged in user flag" either of which can be used for that same purpose. The only distinction bots have is trusted status. Also things have changed since the early days, with 100+ changes a minute the rationale for a bot flag is slightly different - about 11% of changes are by bots (according to a recent poll of 500 changes) , and they are running all the time. Rich Farmbrough. 14:50, 22 February 2006 (UTC)

The bot flag is just another filter on the RC feed, most users don't review bot changes on flagged bots because they assume they're approved. I see the main point of a bot flag is to avoid clogging up RC, I run my bot at 12 edits a min during off peak slow hours (especially when I'm doing an 8K job) - I checked with the bot filter off and my bot was 1 in 5 edits, its mostly just a trust/filtering thing. One with AWB can do the same amount of edits or more just clicking next each time. Tawker 07:16, 26 February 2006 (UTC)

Is C++/CLI Ok??

I have begun coding a bot in C++/CLI, nothing concrete yet...probably not for 2 or 3 weeks. Just checking that the programming language is ok. P.S. I will put up a bot page, specify exact mission and all that other stuff once it is nearer to completion and is closer to testing.Eagle (talk) (desk) 17:50, 25 February 2006 (UTC)

I don't think we really care what people write thier bots in as long as they write code that works properly and doesn't leave a mess behind for other editors to deal with. Plugwash 18:26, 25 February 2006 (UTC)

Ok Thanks, the only reason I was asking was because I did not see any other C++/CLI bots on wikipedia. Sorry about that!!Eagle (talk) (desk) 18:32, 25 February 2006 (UTC)

Most bots are written in Python or Perl because those languages already have frameworks for accessing Wikipedia, and both have ready support for regular expressions. --Carnildo 03:41, 26 February 2006 (UTC)

Ok Thanks, still going with the C++/CLI as I don't know the other languages.Eagle (talk) (desk) 06:08, 26 February 2006 (UTC)

First sentence

The first sentence of WP:B seems very misleading to me. Many bots are not just automated processes but rather have a large degree of human input. I have seen this first sentence cause some confusion. Would it not be better to say "Bots are automatic or semi-automatic tools that interact with Wikipedia over the World Wide Web"? Cheers, Sam Korn ^(smoddy) 14:47, 26 February 2006 (UTC)

Surely the definition of a bot is that it is fully automatic. Martin 16:19, 26 February 2006 (UTC)

Looking briefly at the first few bots on the list, CanisRufus and Commander Keane bot are obviously semi-automatic as they are doing disambiguation and CricketBot says that all edits are checked. That doesn't sound like all bots (on Wikipedia at least) are fully automated. Sam Korn ^(smoddy) 17:22, 26 February 2006 (UTC)

Oh I agree about that, I think the real answer would be to have a new name for accounts that are semi-bot. But I suppose that isnt going to happen. Maybe you are right. Martin 17:27, 26 February 2006 (UTC)

I am the owner of CricketBot, and I agree with Sam. It wasn't at all clear to me when I was writing CricketBot that it was a bot, but the advice here seemed to be that it should be regarded as one. But then AWB was invented, and says at the top of its page that it's not a bot, even though I don't really see the difference. So I'm still a bit confused about this. Stephen Turner (Talk) 18:42, 26 February 2006 (UTC)

AWB is not a bot in the sense that it is automatic, but people use it - as they have used other tools - under a "bot account" to seperate out their user contributions. I think we need to differentiate between bots that are automatic and bots that are actually users using some kind of tool (interwiki, disambiguation, javascripts, AWB etc.) Martin 19:13, 26 February 2006 (UTC)

I have a direct interest in this issue too. I have always been a fast manual editor. People have frequently suggested incorrectly that my manual edits are automated. It was easy to say 'no I am not a bot'. After I started using AWB and scripts, I get the same suggestions but it is not so easy to know what to say in response. As I see it, there are two viewpoints:

1. The editor's viewpoint

The editor knows whether a page edit is manual, software assisted or automatic.

2. The onlooker's viewpoint

The onlooker can only see: edit frequency; edit count; edit content.

Some of the fears of automation relate to edit validation and editor responsibility. Perhaps we should describe editing in those terms e.g. fully automated editing of a page is not validated so perhaps it can be called 'unsupervised', 'unchecked' or some word like that. If a human checks each page, it could be called 'supervised', 'checked', or something like that. bobblewik 19:53, 26 February 2006 (UTC)

I've seen "cyborg" used (in gaming circles at least) to refer to someone who is operating with computer assistance, as opposed to a "bot" running unchecked. --Martin Rudat(T|@|C) 08:51, 28 February 2006 (UTC)

Gnome (Bot)

I am not requesting for a trial run yet, I don't think, unless the perscribed activety below indicates otherwise. (I'm not sure). I plan to run the part of the code that pulls out the pages of wikipedia so the bot can edit. (basically a test of it's ability to navigate wikipedia) NO EDITS WILL BE MADE WITH THE BOT. The only reason I am asking is that the C++/CLI code is running really fast, and loads pages very quickly. I promise that I will upload only in short burst, with at least 30 min to an hour inbetween. Eagle (talk) (desk) 21:41, 27 February 2006 (UTC)

The bots purpose is not yet fully formulated, but to me that does not mean I cannot test the code out to the fullest extent possible. Agian I will not conduct any edits Eagle (talk) (desk) 21:41, 27 February 2006 (UTC)

vandalbot

If I request permission to operate a vandalbot, will it be authorized? --DanielleCunio 01:03, 1 March 2006 (UTC)

No. Dragons flight 01:09, 1 March 2006 (UTC)

I'd need details on exactly what sort of vandalism it would do, what pages it would vandalize, and the expected load on the Wikipedia servers. A reasonably efficient vandalbot would take a lot of load off of the people who are presently vandalizing by hand. --Carnildo 04:05, 1 March 2006 (UTC)

Do I need to make a bot account?

I checked my contributions from when I was using AWB for disambiguation yesterday, and it looks like I averaged (pulls out calculator) 2.13 edits per minute during the time periods in which I was working. I don't think I'm likely to go much faster than this (DAB work requires a certain amount of thought), but I probably will be working at near this speed on a fairly regular basis. Do I need to request a bot to work at this speed, and, if so, would it be a bigger hassle to create a bot account and get it approved or to slow down to whatever would be an acceptable speed? Robth^Talk 05:18, 1 March 2006 (UTC)

2.13 edits per min is borderline bot flag. The policy is not to run a unflagged bot faster than once every 30s so you're borderline. A bot flag isn't that hard to get, create the account, wait a weak, get the flag so if its eaiser it might be better for you Tawker 00:21, 3 March 2006 (UTC)

Sounds good. I'm having some trouble running AWB right now, but once I get that sorted out I'll make up some account with the obligatory pun on "Rob" and "robot" for a name and ask for a trial period. Robth^Talk 03:12, 3 March 2006 (UTC)

Archive??

Uhh.. is it about time to archive some of these? This page is really getting long.Eagle (talk) (desk) 03:31, 3 March 2006 (UTC)

Mathbot Complaint

Mathbot is pasting messages in user's talk pages asking them to do more edit summaries. I find this very obnoxious. Wikipedia is supposed to improve by cooperation and peer review in the act of editing itself. The kind of eye-in-the-sky monitoring of this bot-function is IMHO quite contrary to that spirit. Also, I think there should be a very high burden of proof on bots who leave messages in user's talk pages. Bacchiad 20:20, 3 March 2006 (UTC)

The guideline at WP:ES recommends the use of edit summaries. I don't know mathbot's full scope, but considering it does the edit summary checks for RFA I would think that checking for edit summary use and letting people know is within its operating guidelines. I found it amusing that you didn't use an edit summary on either of your edits (both as the anon and as yourself) here, btw. --Syrthiss 20:37, 3 March 2006 (UTC)

Question on the policy

Sorry about the additional comment but I have a question

Do I need to go here to run a bot without a flag???? Time between edits at say 40 seconds?? The reason I am asking is that I have no need to run faster than that.Eagle (talk) (desk) 21:24, 5 March 2006 (UTC)

PS the policy on the project page is really convolted. (I can't tell what red tape I will trip over)Eagle (talk) (desk)

Request for interpretation of policy

The policy text uses the terms general consensus and any objections but it is not clear about what happens when there *are* objections. An editor says I think one legitimate oppose is all that should be required to deny a bot operating approval. I can see his reading of the text but he happens to be opposed to my application and I would like an independent view. It is a very simple and very powerful rule that could be stated clearly.

If that is the policy then I was wasting my time applying. And over 50 voters were wasting their time voting/debating.

I presume that all those taking part in the debate would like to know:

1. Did Bobblebot get a Rough consensus?
2. Did Bobblebot get approval or didn't it?

Can I have an opinion from somebody independent? Thanks. bobblewik 18:04, 9 March 2006 (UTC)

I was wondering this too. The current policy seems very unclear to me. It says "Get a rough consensus" and then in the very next sentence it says "see if there are any objections, and if there aren't, go ahead". Is it really the intention that one (well-intentioned) editor is all that's required to veto any bot? So running a bot needs unanimity, not just consensus? Stephen Turner (Talk) 18:15, 9 March 2006 (UTC)

I would say Bobblebot got a rough consensus. My counting isn't very good, but numbers look to be about 42 people for and about 15 against. Not as clear as would be desirable. But numbers are only part of consensus, and since there is a general presumption that we WP:AGF my suggestion below, is that we give BB the 1 week testing as suggested by policy page. However I'm not sure if I'm neutral. Rich Farmbrough 18:30 9 March 2006 (UTC).

From Wikipedia:Bots, the burden of proof is upon the bot owner to demonstrate the following:

The bot is harmless
The bot is useful
The bot is not a server hog
The bot has been approved

If people oppose, it's because that people do not believe the bot owner has demonstrated the previous four issues to be true for the bot sufficiently. There is good reason for these policies for bots, a "rough" consensus is not good enough. Rather than trying to circumvent the bot policy, demonstrate the previous issues to be true.

I for one don't believe that the bot is harmless (given the numerous examples of mistakes reported by other editors on Bobblewik's talk page), and I don't believe it is useful - I for one find links around dates useful for browsing to other historical events occuring at a similar time. The bot has been "tested" enough already, when it was being run without permission, and unless Bobblewik explains how he's changed his bot to now be harmless and useful, I don't see how "testing" it for another week will be of any advantage to Wikipedia. Talrias (t | e | c) 19:18, 9 March 2006 (UTC)

Bobblebot part 2

As far as I can see, Bobbblewik has done everything by the book.

He's reduced his editing speed as requested by other users.
He's suspended editing altogether for two long periods, to allow consensus to develop.
His edits are broadly in line with MoS.
He doesn't edit-war.
He remains civil.
He's submitted a bot request, although it is arguable he doesn't need to.

Now it looks as if those that object to his edits, rather than coming up with a good reason or trying to negotiate, are saying that any objection means he can't run under a bot flag. The project page is confused on this point "Get a rough consensus on the talk page that it is a good idea. Wait a week to see if there are any objections, and if there aren't, go ahead and run it for a short period so it can be monitored."

Can we please say at this point - "Bobblewick, run your one week test, and lets gather some feedback." Then we can, by consensus, make any reasonable stipulations, re-test, etc. until everyone's happy, or at least has no reason to complain. Rich Farmbrough 18:21 9 March 2006 (UTC).

Synchronicity! Rich Farmbrough 18:23 9 March 2006 (UTC).

I will make an additional offer to reduce ad hominem (against the person) objections. I am prepared to publish my regex to anyone prepared to share the task, including AWB users. My ideal would be if this were a shared task. bobblewik 18:48, 9 March 2006 (UTC)

He's been running this bot without permission for months, why does he need more testing time? Stop trying to ruleslawyer your way around it. People don't want this bot. Why can't you accept this? Talrias (t | e | c) 18:52, 9 March 2006 (UTC)

Seems clear enough really - there are significant objections, those objections remain, therefore the bot should not get a flag. -- sannse (talk) 18:57, 9 March 2006 (UTC)

Funny you think I'm trying to " ruleslawyer" when I'm suggesting we make a sensible step forward without being strangled by the rules. Ho hum. Rich Farmbrough 21:20 9 March 2006 (UTC).

Let's not re-invent the wheel, Wikipedia is consensus driven, there is no consensus here. Martin 19:00, 9 March 2006 (UTC)

Oh, no - please drop by Wikipedia:Administrators'_noticeboard#Revert_war_between_Bobblewik_and_Ambi. --M @th wiz 20 20 01:41, 11 March 2006 (UTC)

Bobblebot: withdrawal

Following the extensive discussion. I hereby withdraw my application for a bot flag. I would like to thank all those that have taken part in the debate.

I am now lifting my voluntary suspension of date edits in accordance with the MoS. I will continue to lobby for a reduction in the mismatch between articles and MoS. bobblewik 19:25, 9 March 2006 (UTC)

And continue to get it WRONG time and time again; drawing a host of complaints; for example TWICE on the same article in the space of two week you delete the date of death [1] and [2] Jooler 02:02, 10 March 2006 (UTC)

We are not likely to agree that delinking 1865 is right or wrong. But can we agree that delinking January is right? bobblewik 02:17, 10 March 2006 (UTC)

Again, in most but not all cases. The links in Januarius and Janus are appropriate. But, there are far fewer cases where isolated months are linked appropriately, and I would consider voting for a bot that de-linked them. The line between good/bad year dates is still too hazy for me to agree on a bot for them, even though it would be monitored. Since monitoring implies that the bot would run no quicker than the user could read and understand the potential importance of a year date, then bot-speed doesn't seem necessary. Neier 07:36, 10 March 2006 (UTC)

Spelling unifier

I'm not sure if one already exists, but I'd like to code a bot that scans articles for the most common words whose spellings differ between the US and England et. al. The algorithm would go something like:

Search the article for words on list X whose spellings are known to differ between the two countries.
If both spellings are present in the article, count each instance of each word, and if one is Y percent more prevalent than the other, replace all to fit the majority.
If the spellings occur in comparable proportions, list it on one of the bot's subpages to look at personally later.
Stop changing spellings if a message is left on its talk page.

How does that sound? - ElAmericano (dímelo) 06:04, 12 March 2006 (UTC)

Some ideas. If an article is about a British place then it generally gets British spelling - so a human would have to check for things like that on every page (for all Commonwealth countries etc), so the bot deosn't make a mistake. Also, spelling can be rather complicated. For example the word theatre. That is the British spelling, but American's use it when talking about Broadway. So an article about an American actor may trick the bot. That's one exception, there could be more. Are there many articles are suffering from this problem, are there any statistics?--Commander Keane 06:41, 12 March 2006 (UTC)

Thanks for the input. This proposal isn't based on statistics, but just on my experience browsing Wikipedia, where spelling 'style' often changes from paragraph to paragraph. As to the first dilemma, how about a piece of code that checks the first 100 words or so for the words "Britain", "British", "American", "United States", or even more specific terms, and if they point towards a certain spelling, go with that? (Maybe even human-assisted.) Again, if both were present, they would be sent to a page for human decision. I'm not sure what to do as far as Broadway, unless such exceptions are well documented somewhere. - ElAmericano (dímelo) 14:11, 12 March 2006 (UTC)

I'm very uncomfortable with the idea of a bot actually making these changes, I would prefer that the bot stored a list of pages with inconsistent spelling as some maintenance page, and left it up to humans to actually make the necessary changes. Remember, also, that Wikipedia's policy is to use the original language spelling, not the most common. Talrias (t | e | c) 14:56, 12 March 2006 (UTC)

Okay, I'll make a statistics-generating bot, then. I'm assuming that by "original language spelling," you mean the spelling used by the original author of the article? - ElAmericano (dímelo) 00:00, 14 March 2006 (UTC)

Problems with OrphanBot

Supposedly the notices added by Orphanbot to people's talk pages are useful, but it appears that occasionally it adds notices based on incorrect {{nosource}} tags. As the notice doesn't concede such a possibilty, the bot's notices can be qualified as "misleading" at least. I suggest that the bot be stopped until this is fixed. -- User:Docu

If you've got a problem with people incorrectly applying {{nosource}} tags, take it up with them directly. Don't block OrphanBot: it's functioning exactly as advertised. --Carnildo 08:27, 12 March 2006 (UTC)

It's easy to fix this, just change the notice from :

Thanks for uploading [[:Image:..]]. The image has been identified as not specifying the source and creator of the image, which is required by Wikipedia's policy on images. If you don't indicate the source and creator of the image on the image's description page, it may be deleted some time in the next seven days.

to:

Thanks for uploading [[:Image:..]]. With {{nosource}}, the image has been tagged by an editor as not specifying the source and creator of the image, which is required by Wikipedia's policy on images. If you don't indicate the source and creator of the image on the image's description page, it may be deleted some time in the next seven days.

--User:Docu

Perhaps it would be appropriate and useful to identify (in OrphanBot's message) the user who added the tag. — Mar. 12, '06 [16:40] <freakofnur_xture|talk>

It's been discussed before, and there's no reliable way to do this. Among other things, many "no license" tags are applied by the uploader, since "some website" and "don't know" in the dropdown menu on the upload screen both do this. Also, several dozen license templates based on incorrect understandings of law have been redirected to {{no license}}, with more added all the time. Sometimes, the person tagging the image substs the template, and occasionally someone else does so at a later date. And then there's the slowdown from trawling through histories: OrphanBot can currently process about 200 images an hour, so handling the "no source" and "no license" categories takes between 12 and 24 hours right now. Adding in history checks would reduce this to around 150 images an hour for the "no license" catetory, and around 100 an hour for the "no source" category. --Carnildo 20:53, 12 March 2006 (UTC)

Ok, my misunderstanding then. I figured there would be time to do this between throttled edits. Stick with the standard "don't shoot the messenger" disclaimers then. — Mar. 13, '06 [18:57] <freakofnur_xture|talk>

New behavior for OrphanBot

I'm working on a new job for OrphanBot: identifying and tagging unsourced images. Once a day, OrphanBot will download the list of all images uploaded that day, and for images with a blank image description page, or an image description page that consists only of a copyright template requiring that the source be indicated ({{fairuse}}, most public-domain tags, any free-license tag requiring attribution, and other tags), OrphanBot will tag the image as {{no source}}, and will notify the uploader of the problem. --Carnildo 00:11, 13 March 2006 (UTC)

This page is a mess

This page is rather disorganised. Can we either archive some of the discussions here, and/or split out this page into two sections, those for bot request feedback, and those for discussing the project page (which is what talk pages are meant to be for)? Talrias (t | e | c) 17:06, 13 March 2006 (UTC)

I agree, I'll try to help with the archival of inactive sections. --light darkness^(talk) 17:16, 13 March 2006 (UTC)

I created BRFA, hope that the requests gett redirected there. _→AzaToth 18:31, 13 March 2006 (UTC)

This page is pretty much what you want BRFA to be, it's just a recent request has become controversial. Most of the discussion can happen here, the old sections just need to be archived. --light darkness^(talk) 19:17, 13 March 2006 (UTC)

I'm trying to request to run a bot. Do I put it here or there? -- ShinmaWa^(talk) 21:22, 13 March 2006 (UTC)

I now both archived this page, and moved all active request to WP:BRFA, You see that there was a lot of non-requests here also mixed with requests, making it hard to see if a bot was approved or not. _→AzaToth 22:31, 13 March 2006 (UTC)

Complaint

Where can I complain about User:OrphanBot? I disagree with having a bot remove images it feels are unsourced. The sourcing of images is afairly complex process, and I have found this bot to be in error a few times. My complaints have fallen on deaf ears, as evidenced here. Indeed the bot shutoff has been disabled, and the complaints of others have been blown off as well. What should I do? Sam Spade 08:33, 16 March 2006 (UTC)

The tagging is actually done by people in Wikipedia:Untagged images then, Orphanbot simply delinks images that are already tagged as having no source or no license. It lists the articles it delinked from on the image page also, which is helpful for the admin who may be deleting if it was tagged in error to relink the image. So orphanbot really is just notifying uploaders and delinking the images (which notifies anyone who has the article on their watch list). The actual tagging and deletion are still done by people, and there is a 7 day window (if we keep up!) between those events. There was a recent problem with some people tagging a lot of images as having no source in error, which may be the real cause of this complaint, but that was unrelated to orphanbot really. - cohesion^t 09:48, 16 March 2006 (UTC)

Oh... I guess that clears things up somewhat. I still have the objection about which images are tagged, since there seems to be little effort to determine their actual source and status. Also, when a mistake is made, and an image is kept, who goes back and restores the image to obscure articles? Sam Spade 10:08, 16 March 2006 (UTC)

When the image is tagged as no source for example, it goes into a date-based category, then after 7 days an admin will go through and delete them, checking of course that they are tagged correctly etc. When they are tagged incorrectly orphanbot will still have delinked them, but listed the articles they used to be in, so personally I just go to those articles and revert orphanbot and tag the image correctly. I assume most people do this although I can't be certain. Yes, there isn't a whole lot of effort put into finding the source, but with the amount of images and the amount of people we really rely more on the uploader for that. Often if the uploader just writes the source in text we can find the correct tag for them and tag it, but when they don't provide anything at all in most cases we will simply say no source and move on. This doesn't explain the little hiccup in the system last week when lots of images were indiscrimately tagged as no source... :( - cohesion^t 10:55, 16 March 2006 (UTC)

I believe that the hiccup on March 3 was caused by Urshyam discovering the image tagging project and sticking a "nosource" tag on every image he came across. I've had to deal with a lot of complaints as a result of OrphanBot giving people "no source" messages when the image was actually "no license", and OrphanBot's been taking around 18 hours to deal with Category:Images with unknown source, rather than the normal 12 hours. --Carnildo 21:08, 16 March 2006 (UTC)

Bot shutoff has been disabled because it's almost never been used properly.
It has been triggered ten times by users who disagree with the removal or deletion of no-source and no-license images: [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

It has been triggered five times by users just jerking around: [13] [14] [15] [16] [17]

It has been triggered three times by users who believe in shooting the messenger: [18] [19] [20]

It has been triggered twenty times by users asking about copyright tags: [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]

It has been triggered four times by users who don't understand what the bot does: [41] [42] [43] [44]

It has been triggered five times by users who are concerned about how the bot is operating:[45] [46] [47] [48] [49]

And it has been triggered exactly once after a malfunction: by me, after I discovered the bot hadn't realized it was done for the day.
User_talk:Carnildo#OrphanBot_removing_PD-tagged_images.3F is really irrelevant to your point: that particular post was in response to OrphanBot removing every image uploaded by a user, after it was discovered the user was tagging images essentially at random. Removing the images was discussed and approved on the Administrators' Noticeboard.
Your complaints have been ignored largely because they're based on incorrect assumptions and disagreement with Wikipedia policy. Roughly 97.5% of all images OrphanBot has removed from pages have subsequently been deleted. It is Wikipedia policy to delete images that have no source or license information, and, while no policy requires removing deleted images from pages that use those images, it's generally a good idea. It's also a hell of a lot easier to remove those images before they're deleted. --Carnildo 21:08, 16 March 2006 (UTC)

A little addition on Carnildo's last point: By yanking them from the articles it makes the pending deletion visable to the users most interested and able and try to fix/replace the image... not the uploader, not the users cleaning up image tags, but the editors of the article where the image was in use. Yanking the no source images is pretty much the most friendly thing we can do... unless you suggest the bot spam the talk page of every user who has edited an article containing the image. :) --Gmaxwell 03:56, 17 March 2006 (UTC)

Traversing Wikipedia

Okay, I'm working on my spelling stats-generating bot (no edits made in article namespace, only on its own userpage), and I don't really know how to traverse Wikipedia most efficiently. What is common bot practice? - ElAmericano (dímelo) 18:47, 17 March 2006 (UTC)

You're not allowed to traverse or spider Wikipedia. Instead, download a database dump and do your statistical analysis offline. Gdr 19:10, 17 March 2006 (UTC)

In that case you might find a tool I made to scan the datadump useful. Martin 19:22, 17 March 2006 (UTC)

New pages patrol bot

I would like to run a bot that reads the recent changes irc output (in a similar way to CryptoDerk's Vandal Fighter). The bot will tag very obvious vandalism for deletion (although most of this is already caught), very short articles as stubs and articles with no links as needing wikification. The bot will only read the articles when they are over a certain age (i.e. when they are no longer on the first page of the newpages list) to avoid any edit conflicts.

My motivation for this is the increasing volume of substandard articles that we receive that just end up floating around, anyone who has scanned the database will know what I'm talking about. I would like to test it for a week, and then assess how useful it is. I have written most of the software already (in c#), and providing there are no major objections I will probably be ready to test in a week or so. Martin 22:46, 17 March 2006 (UTC)