Jump to content

Wikipedia:Bot requests/Archive 55

From Wikipedia, the free encyclopedia
Archive 50Archive 53Archive 54Archive 55Archive 56Archive 57Archive 60

In relation to my earlier idea for an archiveurl bot, what about this: a bot that looks for dead link-tagged references, looks in the wayback machine, and if it finds the link stored there, it adds the archiveurl/archivedate stuff into the article, and detags the ref, else, if it doesn't, it posts some kind of comment about the ref not being accessible. Lukeno94 (tell Luke off here) 20:27, 24 May 2013 (UTC)

Hmm..., how does the bot know which version actually contains the cited material? Should it assume a recent version will do? If so, Coding... (If this works out, good, and if not, I brush up on my programming.)Thegreatgrabber (talk)contribs 22:08, 24 May 2013 (UTC)
Websites can change, so if a website changes after being cited and Wayback Machine caches it, this might actually result in a working but incorrect citation being added to the article. -- Toshio Yamaguchi 22:15, 24 May 2013 (UTC)
I would propose adding the most recent version before the accessdate parameter. If none is given, use the time of the edit instead. Thegreatgrabber (talk)contribs 22:20, 24 May 2013 (UTC)

Seems like this is Impossible due to the vast number of transclusions involved. Thegreatgrabber (talk)contribs 23:20, 24 May 2013 (UTC)

Impossible? How so? There's nothing stopping you from pulling one transclusion at a time... Theopolisme (talk) 02:43, 25 May 2013 (UTC)
Extremely tedious, anyway, working on it with a generator. Thegreatgrabber (talk)contribs 05:05, 25 May 2013 (UTC)
  • There's no question it is tedious, and that is the entire reason I suggested that a bot do this - there's no question that deadlinks are a major problem, and it will take forever to sort them out, even with a bot. As to which version you choose, well, the recent version would probably be fine, although I suppose one could argue that a middle revision would be the theoretical optimum. That said, Thegreatgrabber's idea is also very good. Lukeno94 (tell Luke off here) 08:47, 25 May 2013 (UTC)

@Thegreatgrabber: are you working on this? Theopolisme (talk) 18:01, 28 May 2013 (UTC)

Not really, I'm too busy. I'd be glad if you could do this. Thegreatgrabber (talk)contribs 05:03, 29 May 2013 (UTC)

Currently this list must be manually updated, which is a tedious time-waster; and I believe it seems like the perfect task for a purpose-built little bot. I can't see it being very difficult to create one that measures the data and updates it. Can we have one for the page? — Preceding unsigned comment added by Doc9871 (talkcontribs)

If you ask me, this is quite useless and an unnecessary strain on servers as it would have to scan all users, from what I know. So I'm going to say Possible Possible but Needs wider discussion..—cyberpower ChatOnline 17:37, 28 May 2013 (UTC)
By "all users" do you mean all the users on the list? There's really not that many; but I know zero about programming bots. Doc talk 01:49, 29 May 2013 (UTC)
That would make the list even more useless. He means list all top users. Which would require a scan of every user.—cyberpower ChatOffline 03:36, 29 May 2013 (UTC)
I'm asking you, Cyberpower, specifically. What "he" means is... you? If you're speaking in the third person, I understand. The uselessness of the list itself is not what I'm looking to discuss. Is it feasible to set up a small bot designed only for this page or not? If not, we'll just have to keep updating it manually. Doc talk 03:44, 29 May 2013 (UTC)
I have no qualms with this list. I have no qualms with a bot to update it, either. It's one query, for goodness' sakes. Theopolisme (talk) 03:52, 29 May 2013 (UTC)
@Theopolisme: What queries would you use to generate this list?—cyberpower ChatOffline 04:18, 29 May 2013 (UTC)
From the watchers extension, it looks like
                        /* FROM   */ 'watchlist',
	                /* SELECT */ 'count(wl_user) AS num',
is supposed to do the trick, although I may be mistaken. Theopolisme (talk) 04:34, 29 May 2013 (UTC)
Ah database query. But still you are making a lot of queries to the database for every user.—cyberpower ChatOnline 12:30, 29 May 2013 (UTC)
Yeah, although if you hurry you can get to Labs and they'll all be lightning fast </end sarcasm--even though it's true, for now> ...don't get me started on one of my Labs/Toolserver rants... Theopolisme (talk) 14:28, 29 May 2013 (UTC)
True. I've got adminstats ready to run over there but I need access to a restricted table to continue.—cyberpower ChatOffline 14:33, 29 May 2013 (UTC)
There is also no mechanism in place that I'm aware of that automatically includes editors on this list. It's voluntary. Couldn't a bot be programmed to check only the users on the list? Again: this is not my area, which is why I'm here on this page. Doc talk 03:56, 29 May 2013 (UTC)
Like I said. It's Possible Possible to create.—cyberpower ChatOffline 04:18, 29 May 2013 (UTC)
If it's not going to drag down the servers, and if it's not going to interfere with any user not listed on the page: why not? It's a bot. I was doing the same thing manually. It cannot be that hard to make a bot for this. Call it "Centijimbo1.0" or whatever you want. Doc talk|

Adding a template to the top of a list of talk pages

See this discussion. In a nutshell, we at WikiProject Medicine want to add a simple template to the top of all the talk pages that transclude certain infoboxes. It should run periodically to check for new articles; so, obviously, it should only add the template if it doesn't already exist. (Alternatively, it could keep a list of pages that it's already updated in a static store somewhere.)

I haven't written a bot before, but looking at Perl's Mediawiki::Bot modules, I think I could probably write this in a couple of hours. But, it strikes me this must be a very common pattern, and I'm wondering if there isn't a generic bot that can be used, or if there is an example someone could point me to that I could just copy-and-hack.

Thanks! Klortho (talk) 01:23, 6 June 2013 (UTC)

Hey Klortho, check out AnomieBOT. There are instructions at the top ("Before requesting a WikiProjectTagger run, please read the following"). Cheers, Theopolisme (talk) 01:36, 6 June 2013 (UTC)

Update article class

Many project pages list articles with their "class" ratings – such as Vital articles. It would be nice to have a bot keep these up-to-date.

Some points:

  • Many pages might not want their lists updated. Maybe a parameter should be added to {{icon}} with |bot=yes.
  • Some articles have two symbols, e.g. . This might need a special {{bot-icon}} template of its own.
  • Different WikiProjects might assign different ratings, so a method of determining which one to use is needed.

Actually, I'd be willing to create this myself, but I don't know the first think about making bots... :-( Ypnypn (talk) 03:41, 5 June 2013 (UTC)

@Ypnypn: Sounds like a great idea. Do you mean you're requesting a bot to update WP:Vital articles? What other lists are you talking about? Theopolisme (talk) 01:44, 6 June 2013 (UTC)
There are other lists of articles that might want this; see Featured topic questions, WikiProject Jewish history/Articles by quality, etc. Ypnypn (talk) 15:43, 6 June 2013 (UTC)
Ah, okay. Actually, Wikipedia:Bots/Requests for approval/TAP Bot 3 is currently open for this very task! You might want to ask there if it could be extended to the other pages you mention. Theopolisme (talk) 16:51, 6 June 2013 (UTC)

GFDL relicensing

User:B left the following message at WP:VPT, but nobody's responded to it.

Can we get a bot to go through and substitute the "migration" flag of images with the {{GFDL}} template? Right now, today, if someone uploads a {{GFDL}} image, it shows the {{License migration}} template. Now that it is four years after the license migration, I think it makes sense to change the default to be not eligible for migration. But in order to change the default, we would need a bot to do a one-time addition of something like migration=not-reviewed to all existing uses of the GFDL template. So if you have {{GFDL}} or {{self|GFDL}}, the template would add "migration=not-reviewed" to the template. After that is done, we can make the default "not-eligible" instead of the default being pending review. --B (talk) 00:16, 2 June 2013 (UTC)

I thoroughly agree with this idea; the template needs to be modified so that it defaults to migration=not-eligible, and we need to mark existing examples as needing review. This wouldn't be the first bot written to modify tons of license templates; someone's addition of "subject to disclaimers" to {{GFDL}} (several years ago) forced Commons to maintain two different GFDL templates, and they had to replace many thousands of {{GFDL}} transclusions. Nyttend (talk) 03:55, 4 June 2013 (UTC)

I'm willing to do so, but I'm not sure if this needs wider discussion, per WP:NOCONSENSUS. Maybe you should leave a note at WP:VPR? Also, for clarification, it would be adding |migration=not-reviewed to all transclusions of {{GDFL}} in every transclusion, provided that there was no value set for |migration=?  Hazard-SJ  ✈  05:01, 5 June 2013 (UTC)
Requested at WP:VPR. I quoted your "for clarification" sentence and said basically "Here's what's being requested and the proposed solution; are both okay?" Thanks! Nyttend (talk) 02:14, 7 June 2013 (UTC)

I'm removing superfluous redlinks from country outlines.

To remove the redlinks of "Political scandals of foo" (where foo is a country name) from each country outline, I take a list of country outlines (on a user subpage) and replace "Outline of" with "Political scandals of". That list now shows which links are red.

Then I use AWB's listmaker to make a list of all the redlinks on that page, and I paste them back in to the same page and linkify them. Now there's only redlinks on that page.

Next I change "Political scandals of" back to "Outline of". The list is now all the country outlines that contain the "Political scandals of" redlink in them.

I make a list in AWB from that, and then do a regex search/replace with AWB to get rid of the entry in each of those outlines.

Unfortunately, I have to do this whole procedure over again for each redlink I wish to nuke (so as not to nuke the blue links too), and this approach only works with sets of links that share the "of foo" nomenclature. Because AWB has no way to search/replace strings based on redlink status (as far as I know).

If you know of a easier/faster way to do this, please let me know. The Transhumanist 06:18, 19 May 2013 (UTC)

  • Just use rapid #ifexist for those redlinks or show all links as blue: Although the parser limits the use of #ifexist to 500 conditions, it can be used to choose to show non-wikilinked text at redlinks:
  • {{#ifexist:boguspage|[[boguspage]]|boguspage}} → boguspage
Then, when eventually someone writes the missing page, then the wikilink "magically" returns as a blue-link text. However, another method is to force the link-display text to appear blue, so redlinks look blue but still click to create the new page. Example:
Use a regex editor to update all, or the most-likely wikilinks, to use those methods. Either way, the page will have wikilinks to those pages, once they are created, or to create them when someone clicks a blue-toned redlink. BTW those nation/country outlines are awesome pages. -Wikid77 (talk) 14:16, 19 May 2013 (UTC)
I wouldn't be averse to making a dedicated template as well (i.e., {{outlinelink}}), if only to make it clearer to readers/editors. Theopolisme (talk) 14:22, 19 May 2013 (UTC)
Thank you. I hope to clean up the country outlines to make them even better. Your ideas look very powerful. I will definitely start testing this approach for improving outlines.

With respect to building new outlines, I was thinking of using a template with every possible link under the sun in it to create city outlines. So the template would include links that would have nothing to do with many cities, like Port of Foo (which only applies to cities that have a port authority). This approach would be impractical without a good redlink stripper. But if I had a fast way to identify and strip out redlinks, I could use substitution and AWB to autobuild (in draft space) a bunch of city outlines (which because of the comprehensive shotgun approach of the template would mostly contain redlinks to irrelevevant topics). Then I'd strip out the redlinks using the miracle tool, touch up the outlines, and then move them to article space. My questions are: What would the miracle tool be? And how would it work?

I look forward to your replies. The Transhumanist 01:02, 20 May 2013 (UTC)
  • Using #ifexist on bullet lines makes redlink lines disappear from list: The magic tool which you seek is, indeed, #ifexist which can check when the linked page exists, otherwise omit the link and cause the bullet-line to be omitted from a list. Example for Houston (on ship channnel) and El Paso (Texas) far away from any sea:
  • {{#ifexist:Port of Houston|[[Port of Houston]]}}   Result: Port of Houston
  • {{#ifexist:Port of El Paso|[[Port of El Paso]]}}      Result:
Note, the line for "Port of El Paso" totally disappears from a list, because an asterisk-bullet with no text is omitted; however, if someone created a water-theme-park, or famous seafood restaurant, named "Port of El Paso" then the wikilink would re-appear, and the bullet-line would enter the list.
Because #ifexist runs at rates over 500 per second, then putting #ifexist into a list of 200 article titles runs within one-fifth (1/5) second, which is truly a miracle tool:  to automatically change the contents of a list, while checking for 200 live article titles, and format that list, all within one-fifth second. The limit is only 500 #ifexist per page (but no logical reason why), so outlines-of-countries with 150-200 links will still run ok. -Wikid77 (talk) 06:50, 20 May 2013 (UTC)
I can see how this will work in many cases, especially for hatnotes and {{Main}} links, and straight lists. But what about the branching in structured lists? A redlink nuker script could selectively nuke or delink a redlink depending on whether or not it branched. Keep in mind that outlines (and many lists) have levels. (Some outlines are 10 levels deep, maybe more.) If a parent node disappeared, that would bump its children up the list to be offspring of the entry that was directly above their parent - giving them the wrong parent! And since outlines grow by editors adding children all over the place, how do you protect those children from getting bumped up a list due to parent disappearance?
#ifexist looks very powerful, and perhaps solutions to the above reservations can be found, so I'm starting a test of this magic tool right away. I'm in the process of building {{Outline City+}}.
I look forward to our answer to my question. The Transhumanist 07:36, 20 May 2013 (UTC)


I added a magic entry to {{Outline City+}}, and it turned invisible. Not a big problem, but once the whole template is converted, it will make it look rather sparse upon first inspection.   :)   The Transhumanist 07:55, 20 May 2013 (UTC)

Some of the items have section links. Will #ifexist check the existence of sections? The Transhumanist 08:08, 20 May 2013 (UTC)

No, it's not clever enough to do that. {{#ifexist:Main page#Nonsense section name|yes|no}} ⇒ yes -- John of Reading (talk) 06:24, 21 May 2013 (UTC)

Manually stripping redlinks from outlines is a royal pain in the...

See Outline of Guam, for example.

I desperately need help from someone with a bot (or help building one) that can do all of the following:

Remove each bullet list entry that has no offspring and that is entirely comprised of a redlink (like "Flora of Guam", below)

but only delink (remove the double square brackets from) those that have offspring (like Wildlife of Guam and Fauna of Guam, above. (By "Offspring", I mean one or more (non-red) subitems). So the above structure should end up looking like this:


If a redlink entry has an annotation or descriptive prose after it, it should be delinked rather than deleted. Here are some examples from Outline of Guam:

These should end up looking like this:


Also, "main" article entries that are red must be removed. Ones that look like this:

But, if they have any bluelinks in them, only the redlinks should be removed. So...

Main article: History of Guam, Timeline of the history of Guam, and Current events of Guam

...should be made to look like this:

Main article: History of Guam


If a section is empty or is made empty due to all of its material being deleted, its heading must be deleted as well. (See Outline of Guam#Local government in Guam. This heading will have no content once its red "main" entry is removed, and so this heading needs to be removed too.)


Many outlines have had redlinks sitting in them for years. They need to be cleaned up. There are so many outlines, this is infeasible to do manually.

Are there any bots that can do this?

If not, how would a script check a link on a page it was processing for whether or not the link was a redlink?

I look forward to your replies. The Transhumanist 00:37, 20 May 2013 (UTC)

A bot can do this and I wouldn't be opposed to writing one that takes care of red links on Wikipedia.—cyberpower ChatOffline 03:00, 20 May 2013 (UTC)
What you're looking for would require more work to be done to take care of this, redlinks aside. Should still be doable though. Why do they need to be removed?—cyberpower ChatOffline 03:04, 20 May 2013 (UTC)
Two main reasons:
  1. Concerning existing outlines, to clean them up. Especially the outlines that were generated via template (countries, mostly). Compare Outline of French Polynesia with Outline of Australia. The former is choked with redlinks that will unlikely ever turn blue. The latter is more neat and refined. AWB can handle link placement in sets of outlines easily, but it doesn't differentiate between placing blue or red links. Thus, in an AWB pass through many outlines, you often add more redlinks than blue. A redlink stripping tool would make up for this. So by using the two tools in combination, you could keep entire sets of outlines updated without accumulating undue redlinks. Also, several outlines were deleted because they had too many redlinks, and I wish to prevent that from happening again. You can see their redlinks in Portal:Contents/Outlines#Geography and places.
  2. To support rapid outline construction. Toward this end, I would like to build outlines via substitution using templates that include every conceivable topic. Such as using a template for creating city outlines that includes all titles that might pop up in a city. For example, Parks in Foo, Islands of Foo, Taxicabs in Foo, Subway of Foo, Trade Board of Foo, Castles of Foo, etc. There are hundreds of potential links. Why include them? To catch as many blue links as possible. Most of the topics that don't turn blue won't have anything to do with the city (like Port of Foo in a land-locked city) and would need to be removed. But because they would be red, these could be stripped away with the tool we've been talking about. The outline drafts would remain in project space until they were ready to be moved to article space.
What did you mean by "What you're looking for would require more work to be done to take care of this"? The Transhumanist 05:07, 20 May 2013 (UTC)
Sorry for the delayed response. Building a tool to look for redlinks alone and delink them is pretty straightforward. Building a tool to process outlines would require a lot more work because now you need to pick the article apart and analyze it. Are they just outlines like in the example above, or are there different formats?—cyberpower ChatLimited Access 17:26, 22 May 2013 (UTC)
There are as many differences in formatting as there are editors working on them. The one above is roughly followed by country outlines. Continent outlines differ. City outlines are differ. Academic field outlines differ. Etc.
I can do the picking apart and analyzing. Problem is, I don't have a clue how to look for redlinks. Other than scraping the html. What ways were you thinking of? The Transhumanist 19:27, 23 May 2013 (UTC)
Pywikipedia can tell you very easily if the page is a redlink; otherwise, an API call: [2] for a page that does exist vs. [3] for one that doesn't...note the -1 as a key in the dictionary for the page that doesn't exist. Theopolisme (talk) 21:47, 23 May 2013 (UTC)
Basically I was going to say the same as Theopolisme. I don't do Python, though. I write in PHP. For the record, I have already been working on the redlink delinker script. As for outlines, I would have to say it would be impossible to have a bot do if the formatting of outlines is so dynamic.—cyberpower ChatOnline 23:47, 24 May 2013 (UTC)
Nice. I can't wait to read the script. By the way, which method will pose the least strain on Wikipedia's servers? The Transhumanist 12:09, 26 May 2013 (UTC)
So I'm going to leave this discussion as:
Cleanup outline: Impossible
Redlink delinker: Coding...cyberpower ChatOnline 17:42, 28 May 2013 (UTC)
Every little bit helps. Thank you. By the way, you didn't mention which method of redlink checking would pose the least strain on WP's servers. I look forward to your reply. The Transhumanist 08:09, 1 June 2013 (UTC)
Because there is no easy way to determine a redlink, every method used will be an incredible strain on the servers, which is why I am being very careful with how I develop the script.—cyberpower ChatOnline 21:54, 1 June 2013 (UTC)

@The Transhumanist and Theopolisme:According to BAG, this Needs wider discussion..—cyberpower ChatOnline 18:55, 7 June 2013 (UTC)

FUR addition bot

The request is for an automated bot that scans through Category:Non-free images for NFUR review and attempts to automatically add a preformatted NFUR rationale when one is not is present.

This bot would not apply to all Non-free content and would be limited initally to {{Non-free album cover}}{{Non-free book cover}} {{Non-free video cover}} and {{Non-free logo}} tagged media where the image is used in an Infobox. Essentially this bot would do automatically, what I've been doing extensively in a manual fashion with FURME

In adding the NFUR the bot would also (having added a rationale) also add the |image_has rationale=yes param as well as leaving an appropriate note that the rationale was autogenerated.

By utilising a bot to add the types of rationale concerned automatically, contributer and admin time can be released to deal with more complex FUR claims, which do not have easy pre-formatted rationales or which require a more complex explanation.

Sfan00 IMG (talk) 14:28, 4 June 2013 (UTC)

BAD IDEA. Rationales should have human review. Otherwise you get articles with 10 different covers, bot approved. Werieth (talk) 15:09, 4 June 2013 (UTC)
I'm usually against templated FURs, but the narrow conditions being discussed here (eg cover art or logos in infoboxes, adding a FUR where one is absolutely not present) seems reasonable. I would ask that some parameter be added to the page that makes it clear a bot added the rationale and that thus a human review has not affirmed, as well as language on the page that this action has been performed by the bot and if the editor can improve on it, they should. I agree on the general idea that this moves a number of trivially-fixed cases out of the human chain of NFC review to allow focus on more complex/non-standard cases, though as with Werieth's concern this shouldn't be seen as "okay, the image passes NFCC" rubber stamping that could be implied from this. --MASEM (t) 15:18, 4 June 2013 (UTC)
I've not no objections to the bot added tags categorising auto-generated rationales, so that they can still be reviewed by a human. Limiting this to infobox use only would be appropriate as 'appropriateness' of use elsewhere cannot be determined automatically. Masem, did you have a specfic wording in mind? Sfan00 IMG (talk) 15:31, 4 June 2013 (UTC)
Would something like {{Non-free_autogen}} be acceptable?Sfan00 IMG (talk) 16:31, 4 June 2013 (UTC)
Something like that but I would expand it more to say the bot's name (and task if necessary), that the image is believed to be tagged as it meets NFCI#1 or #2 (covers vs logos) but that this does not assure that NFCC is met (not a free pass) and encourage editors to expand the rationale. It should also place the image in a maintenance category related to the bot -tagging - that won't be a cleanup category though users would be free to go through, review rationales, and strip out the template if they can confirm the template rational is fine. --MASEM (t) 16:47, 4 June 2013 (UTC)
Feel free to expand it then, this is a wiki :) Sfan00 IMG (talk) 16:51, 4 June 2013 (UTC)
That seems reasonable initial limitation, given this is intended to be for Non-free content in infoboxes.Sfan00 IMG (talk) 17:24, 4 June 2013 (UTC)
Also the media concerned must be used in no more than 1 article, mainly because to generate auto rationales for multi page uses gets more complex. Sfan00 IMG (talk) 17:31, 4 June 2013 (UTC)

Here's my general outline of what it looks like the bot will need to do:

For all files in Category:Non-free images for NFUR review:
	If image meets the following constraints:
		- tagged with {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}
		- only used in one article
		- file must be the only non-free file in the article
	Then:
		- on the image page:
			- add some fairuse rationales to {{Non-free use rationale}} or {{Non-free use rationale 2}}
				- *** I will need rationales to insert ***
			- add "|image has rationale=yes" to {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}
			- add a new parameter "bot=Theo's Little Bot", to {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}
				- this might need additional discussion as far implementation/categorization

As you can see, there are still some questions -- #1, are there rationales prewritten that I can use (Wikipedia:Use_rationale_examples, possibly...)? Secondly, as far as clarifying that it was reviewed by a bot, I think adding |bot= parameters to {{non-free album cover}} and such would be easy enough, although if you have other suggestions I'm all ears. Theopolisme (talk) 06:53, 7 June 2013 (UTC)

Per the above thread,
  1. There is an additional criteria that 'the file should be used in the infobox', This is because the code for adding this

is a straight translation (and partial extension) of what FURME does, substituting the {{Non-free use rationale}} types for the {{ <blah> fur}} types it uses currently. FURME itself needs an overhaul and re-integration into TWINKLE, but that would be outside the scope of a bot request.

  1. the pre written rationales are {{Non-free use rationale album cover}} {{Non-free use rationale book cover}} {{Non-free use rationale video cover}} {{Non-free use rationale logo}} which are the standard templated forms.
  2. I'd been using {{Non-free autogen}} as a means of marking semi-automated additions of rationales I'd made- The wording

still needs to be tweaked, but in essence it uses the |bot= and |reviewed= style as opposed to modification of the license template. Note this also means it's easier to remove the tag once it's been human reviewed. Sfan00 IMG (talk) 11:10, 7 June 2013 (UTC)

Great, thanks for the speedy reply (and the clarification). I'll get on to coding this now. 15:11, 7 June 2013 (UTC)
More details might come out as you start running limited tests on it in terms of process, but I think the general approach is fine. One point, being "file must be the only non-free file in the article" requirement, I don't think is necessary unless you cannot determine that the image is in the infobox (all standard infobox templates). I agree if you can't tell via program that the image is used in the infobox, the single use is a good starting point, but if you can, then you can "broaden" the requirement to being an image strictly used in the infobox. This might catch false positives of cases where editors use infoboxes later in the article (often movie soundtracks for movies) but that at least puts a rationale from there and human intervention is still needed to judge if those are right or wrong. Another point is that as template FURs are not required, there may be prose-based FURs, which at minimum need to name the article (or a redirect to the article) that the image is used in (and this doesn't have to be linked). So you may need, when checking for absence of a FUR, see if this case works too. --MASEM (t) 15:44, 7 June 2013 (UTC)
Okay, I'll just add a check to make sure the image is used in an infobox. That makes things a lot easier on my end, trust me! :) Theopolisme (talk) 15:57, 7 June 2013 (UTC)

Bot to update Adopt-a-user list

Hi! I am looking for a bot that could update Wikipedia:Adopt-a-user/Adoptee's Area/Adopters's "currently accepting adoptees" to "not currently available" if they haven't made any edits after a certain period of time. This is because of edits like this where new user asks for adopters, but because the adopter has gone from Wikipedia, they never get on and just leave Wikipedia. Thanks, and if you need more clarification just ask! jcc (tea and biscuits) 17:14, 6 June 2013 (UTC)

I would take up this task, but my hands are somewhat tied other tasks at the moment.—cyberpower ChatOnline 22:06, 6 June 2013 (UTC)
Coding... -- jcc, after what period of time should users be marked as "not currently available"? 30 days? 60 days? Theopolisme (talk) 02:20, 7 June 2013 (UTC)
Erm, up to you really, but maybe 1 month(?), seeing as the bot will probably recheck every so often, so there's no need to do something like 3+ months in fear of the problem that some adopters might come and go. Up to you really, jcc (tea and biscuits) 16:14, 7 June 2013 (UTC)
Sorry for forgetting to let you know here, but this bot has been approved and will run weekly. Again, sorry for not keeping you in the loop! Cheers, Theopolisme (talk) 16:59, 9 June 2013 (UTC)

Archive bots

Heh, it's me again, with more "archive" bot requests. Here's another simple two: 1: If a |url= part of a reference has the web.archive.org) string in it, remove it and strip it back to the proper URL link. 2: If a reference has a |archiveurl tag, but is lacking in a |archivedate tag, grab the archiving date from the relevant part of the archive url, e.g http://web.archive.org/web/20071031094153/http://www.chaptersofdublin.com/books/Wright/wright10.htm would have "20071031" grabbed and formatted to "2007/10/31". [4] shows where I did this sort of thing manually. Lukeno94 (tell Luke off here) 20:49, 6 June 2013 (UTC)

Hello, no need. That is already coded in for Wikipedia:Bots/Requests for approval/Hazard-Bot 21, (well ... I used the dash format, though), and I'm going to leave an update there. Also, another bot is doing this as well.  Hazard-SJ  ✈  00:20, 7 June 2013 (UTC)
@Hazard-SJ: I don't see this (at least #1) in your script's source code...Luke is talking about using regex to take a url parameter that is a link to web.archive.org and get the original url from that, then move the archive url to archiveurl, and replace url with the actual (dead, presumably) url. Theopolisme (talk) 00:28, 7 June 2013 (UTC)
@Theopolisme: I haven't added it to GitHub as yet, but I intend to soon, along with requesting another trial.  Hazard-SJ  ✈  00:45, 7 June 2013 (UTC)
Oh, fabulous. My apologies. Theopolisme (talk) 00:46, 7 June 2013 (UTC)
No problem ... I'm currently fine-tuning the code t ensure it works before I do so :)  Hazard-SJ  ✈  00:50, 7 June 2013 (UTC)

Renewed request – Most missed articles

The Wikipedia:Most missed articles -- often searched for, nonexistent articles -- has not been updated since a batch run in 2008. The German Wikipedia person, Melancholie (de:Benutzer:Melancholie) who did the batch run has not been active since 2009. Where would be a good place to ask for someone with expertise to do another run? It does not seem to fit the requirements of Wikipedia:Village pump (technical) since it is not a technical issue about Wikipedia. It is not a new proposal, and not a new idea. It is not about help using Wikipedia, and it is not a factual WP:Reference Desk question. I didn't find a WikiProject that looked promising. So I am asking for direction here. --Bejnar (talk) 19:52, 11 June 2013 (UTC)

It's harder than it looks. :) Have you tried emailing Melancholie? Theopolisme (talk) 20:36, 11 June 2013 (UTC)
Melancholie is long gone. No response. How is it done? Can I learn? --Bejnar (talk) 03:43, 12 June 2013 (UTC)
Looks like the squid stats are in a very poor way. The more I think about it, the harder it seems. Stuartyeates (talk) 04:02, 12 June 2013 (UTC)
The best reliable method that I know of just counts red links. Werieth (talk) 12:14, 12 June 2013 (UTC)
As I understand it, this had nothing to do with redlinks. Over a period of several months, the process captured the searched for text from the Wikipedia search box, dropped out those which had hits, stored each unsuccessful search term/phrase alphabetically with a counter that incremented for each use of that search term/phrase. At the end of the time period, the resultant database was sorted by frequency and the low volume terms were dropped, it was then run through a scat-remover to delete common obscenities and the like, and put out for editorial consumption at Wikipedia:Most missed articles I generated a number of articles suggested by that database that have reasonable hit rates. In some ways the programming may/might resemble that of Wikipedia article traffic statistics. --Bejnar (talk) 05:09, 13 June 2013 (UTC)

Template:Hampton, Virginia

Add {{Hampton, Virginia}} to every page in Category:Neighborhoods in Hampton, Virginia. Emmette Hernandez Coleman (talk) 23:09, 17 June 2013 (UTC)

Oooh, and easy one. :-)—cyberpower ChatOnline 00:06, 18 June 2013 (UTC)
No bot needed, this is only 11 pages.  Hazard-SJ  ✈  05:09, 18 June 2013 (UTC)

Unreliable source bot

This bot's job would be to stick {{unreliable source}} next to references that are links to websites of sketchy reliability as third-party sources (e.g., blog hosting sites, tabloids, and extremely biased news sites). The list of these sites would be a page in the MediaWiki namespace, like MediaWiki:Spam-blacklist. In order to let through false positives (for instance, when the site is being cited as a primary source), the bot would add a hidden category (maybe Category:Unreliable source tags added by a bot) after the template to identify it as being added by the bot, and enable editors to check the tags. The hidden category would be removed by the editor if it was an accurate tagging. If not, the editor would comment out the {{unreliable source}} tag, which would mean that it would be skipped over by the bot in the future. ❤ Yutsi Talk/ Contributions ( 偉特 ) 14:16, 13 June 2013 (UTC)

This task might be Impossible to do. I haven't checked the databases yet though, so I'll get back to this later today.—cyberpower ChatOffline 16:17, 13 June 2013 (UTC)
Are you proposing a new MedaWiki page? Theopolisme (talk) 16:34, 13 June 2013 (UTC)
This actually isn't a bad idea, it just needs some more thought. It's trivial for a bot to run around tagging all "domain.com" links as {{unreliable source|bot=AwesomeBot}}, however how you construct your url blacklist is probably most important. For example, examiner.com is on the blacklist because its just overall not a reliable source. However there are many links to it, and those have all probably been individually whitelisted[citation needed]. So tagging those wouldn't be useful. Did you have a few example domains in mind? That would help in evaluating your request. Legoktm (talk) 16:54, 13 June 2013 (UTC)
I think it's a good idea for a bot too. The problem is, the blacklist are regex fragments and AKAIK, you can't use regex to search for external links. I could be wrong though. I'm pretty sure the database on Labs can help me out with this though, but before I know for sure, I'm just going to assume the worst answer.—cyberpower ChatOffline 17:07, 13 June 2013 (UTC)
mw:Manual:externallinks table.... Legoktm (talk) 17:29, 13 June 2013 (UTC)
I figured as much. I was talking about the API. I just didn't want to generate false hope before I knew for certain that it was doable. Anyways, I wouldn't mind taking up this task.—cyberpower ChatOffline 17:32, 13 June 2013 (UTC)
Is the bot going to be able to work around the spam filter? Or maybe should the bot just remove citations that were added before the spam filter entry was created? Thanks! GoingBatty (talk) 03:31, 15 June 2013 (UTC)
We'll see. :-)—cyberpower ChatOnline 18:01, 16 June 2013 (UTC)

I'm a lawyer. One feature I really like is the ability to enter in a cases legal citation, like "388 U.S. 1" (the legal citation for Loving v. Virginia), as the "title" of a Wikipedia article, and have that entry automatically redirect to the correct page. However, this only exists for Supreme Court cases up to somewhere in the 540th volume of the U.S. reporter, and (as far as I know), not for any other cases.

It would be great if a bot could automatically create redirects between a legal citation and a page about that case. — Preceding unsigned comment added by Jmheller (talkcontribs) 03:42, 16 June 2013 (UTC)

Would you happen to have a list? Someone could automate this using AWB. Thegreatgrabber (talk)contribs 06:00, 16 June 2013 (UTC)
[5]? For example, [6] = Loving v. Virginia = 388 U.S. 1. Theopolisme (talk) 16:10, 16 June 2013 (UTC)
I suspect there'd be some support for this at the SCOTUS WikiProject as well, Wikipedia talk:WikiProject U.S. Supreme Court cases, it might be worth leaving a note, someone there might have ideas for where the data could be most easily grabbed. One small thing worth noting, I seem to recall that in very early volumes these identifiers didn't turn out to always be unique. They are for nearly all cases of interest, though, so you could simply omit the ambiguous ones, or, for extra credit, create disambiguation pages. --j⚛e deckertalk 16:32, 16 June 2013 (UTC)
Doing... Thegreatgrabber (talk)contribs 23:48, 17 June 2013 (UTC)
Still doing... Thegreatgrabber (talk)contribs 16:04, 19 June 2013 (UTC)

@Jmheller: What should be done for cases like [7] where cases are only listed by docket number? Thegreatgrabber (talk)contribs 02:21, 21 June 2013 (UTC)

I just found out that the links to the GeoWhen database are dead. However, through a web search I discovered that the pages are still accessible – just add bak/ after the domain http://www.stratigraphy.org and before the geowhen/ part. Some links have been fixed already, but not all. Note: An additional mirror is found under http://engineering.purdue.edu/Stratigraphy/resources/geowhen/. --Florian Blaschke (talk) 16:03, 14 June 2013 (UTC)

A simple template would do the job, wouldn't it? --Ricordisamoa 17:47, 14 June 2013 (UTC)
I suppose, but I've never made a template and I consider this advanced wiki-magic. --Florian Blaschke (talk) 09:02, 19 June 2013 (UTC)
OK. I've actually created a template now simply by copying and modifying Template:Iranica and it seems to work. What now? --Florian Blaschke (talk) 09:24, 19 June 2013 (UTC)
@Florian Blaschke: Are these what should be replaced? If so, this would probably be better manually, since it's not much.  Hazard-SJ  ✈  03:21, 20 June 2013 (UTC)
Well, I'd prefer not to replace them manually, and if we go for the template, which makes sense given that the current mirror might soon move again, there are even more links to replace. --Florian Blaschke (talk) 10:20, 20 June 2013 (UTC)
I could try something... like this for {{Find a Grave}} on itwiki. Please let me know. --Ricordisamoa 10:35, 20 June 2013 (UTC)
Please do try, that would be the neatest-looking solution. --Florian Blaschke (talk) 14:18, 21 June 2013 (UTC)
Many of them are in {{cite web}}, what to do with them? --Ricordisamoa 13:13, 22 June 2013 (UTC)
It would be preferrable to have them all use the template, just in case the address moves again. That's the cleanest long-term solution, as far as I can see. --Florian Blaschke (talk) 14:02, 22 June 2013 (UTC)
So, should we remove {{cite web}}? Maybe using it within {{GeoWhen}}? --Ricordisamoa 14:14, 22 June 2013 (UTC)
I don't think it is necessary to use {{cite web}} at all. Comparable templates don't use it, either, do they? --Florian Blaschke (talk) 17:25, 22 June 2013 (UTC)

Null edits to update categories

Wikipedia:Categories for discussion/Working/Manual#Templates removed or updated - deletion pending automatic emptying of category has a very large backlog of hidden categories that have been renamed. Due to the way some or all of these categories are generated by the template the job queue alone doesn't seem able to process them and individual articles require null edits to get the updated category. Is it possible for a bot to have a crack at these? Timrollpickering (talk) 16:10, 21 June 2013 (UTC)

Wikipedia:Bots/Requests for approval/Hazard-Bot 23 was already filed and included this.  Hazard-SJ  ✈  20:28, 22 June 2013 (UTC)

This one should be straightforward:

  1. Search through all articles tagged as having bare references.
  2. Run a Reflinks-style script to convert the bare url into a cite web reference on each article.
  3. Tag any dead links.
  4. If there are no dead links, and all references have been converted, remove the "bare references" tag.

Et voila. :) Lukeno94 (tell Luke off here) 14:03, 22 June 2013 (UTC)

Not something that can be automated, reflinks requires human review. Werieth (talk) 14:06, 22 June 2013 (UTC)
 Request denied This has come up many times before and a couple of actual requests for approval have been denied. See here, here, here, here, here, and here; all of which have a consensus of NO. Sir Rcsprinter, Bt (gab) @ 19:58, 22 June 2013 (UTC)

Requ. to move WikiProject

Wikipedia:Categories_for_discussion/Log/2013_June_22#WikiProject_Skepticism

This is a request for assistance in moving the assessment categories for WikiProject Rational Skepticism.Greg Bard (talk) 20:02, 23 June 2013 (UTC)

WP:CFD/W will take care of that. there are bots that used that page. Werieth (talk) 20:03, 23 June 2013 (UTC)
WP:CFD/W is a protected page that I cannot edit. So I'm getting the run-around at this point. Greg Bard (talk) 20:59, 23 June 2013 (UTC)
Wont happen. Just get an admin to list the cats. Werieth (talk) 21:15, 23 June 2013 (UTC)
I can do it. -- Magioladitis (talk) 21:20, 23 June 2013 (UTC)

Fixed all categories manually, updating all pages with my bot. -- Magioladitis (talk) 21:45, 23 June 2013 (UTC)

I deleted all old categories, I fixed/normalised all banners and user wikiproject tags. -- Magioladitis (talk) 23:25, 23 June 2013 (UTC)

Unreferenced

Is there any way that a bot can go through and find instances where an article has the {{unreferenced}} template and {{references}}/<Reflist>/any other coding pertaining to references? I've seen a lot of instances of {{unreferenced}} being used on articles that do have references. This seems like it should be an easy fix. Ten Pound Hammer(What did I screw up now?) 03:35, 20 June 2013 (UTC)

@TenPoundHammer: Yes, that's possible, and in that case, Coding...  Hazard-SJ  ✈  03:37, 20 June 2013 (UTC)
I was doing this for a while with an AWB bot which would change {{unreferenced}} to {{refimprove}}. However, I found too many instances of {{unreferenced}} being incorrectly used under a section header (instead of {{unreferenced section}}), so I stopped running the bot. I haven't dedicated the mindspace to figure out how to fix this. GoingBatty (talk) 03:59, 21 June 2013 (UTC)
Also note that often articles will have a {{reflist}} and have zero references, those should still be tagged as unreferenced. My thought would be to convert unreferenced => more refs if <ref> is contained in a non-commented out wikicode. Werieth (talk) 16:16, 21 June 2013 (UTC)
Since <ref>...</ref> tags can contain notes instead of references, you might want to limit your search to citation templates, such as {{cite web}} and {{cite news}}. GoingBatty (talk) 02:52, 22 June 2013 (UTC)
Example of an unreferenced article using <ref>...</ref> tags to contain a note: Battle of Breitenfeld (1642). GoingBatty (talk) 12:22, 22 June 2013 (UTC)
 On hold, unless someone else wants to do this.  Hazard-SJ  ✈  02:34, 25 June 2013 (UTC)
Using my bot to preparse the list and find examples of {{unreferenced}} being incorrectly used under a section header - stay tuned. GoingBatty (talk) 04:19, 25 June 2013 (UTC)

Change "can refer to" to "may refer to" in DABpages

Do like I did here in disambiguation pages per WP:DABSTYLE. -- Magioladitis (talk) 15:39, 23 June 2013 (UTC)

Should be pretty to easy to do with AWB I think MOS:DABINT probably backs up the change more. It is maybe too small a change to bother doing unless the page is already being edited for some other reason though. --Jamesmcmahon0 (talk) 15:15, 27 June 2013 (UTC)

Bot to add articles to the Sorani Kurdish Wikipedia (CKB) about Iraqi cities using census data

Hi! Is anyone interested in writing a bot that is used to articles to the Sorani Kurdish Wikipedia (CKB) about Iraqi cities using census data?

I found Iraqi census data at http://cosit.gov.iq/pdf/2011/pop_no_2008.pdf (Archive) and the idea is something like User:Rambot creating U.S. cities with census data

Thanks WhisperToMe (talk) 00:11, 26 June 2013 (UTC)

We could import these data into WP:WD directly. --Ricordisamoa 23:47, 26 June 2013 (UTC)
Cool! How do we do that? What do I need to do on my end? WhisperToMe (talk) 00:52, 28 June 2013 (UTC)

Exoplanets table

I was wondering if anyone would be able to create a bot that would be able to copy the information about planets detected by the Kepler spacecraft from the Extrasolar Planets Encyclopaedia (link) to our list of planets discovered using the Kepler spacecraft. Rather than merely going to the list, it would be ideal if the bot could follow the link for each Kepler planet and get the full information from there, rather than merely looking at the catalog. The information in the EPE about Kepler planet is in turn copied from the Kepler discoveries catalog, which is in the public domain but is unfortunately offline at the moment (requiring us to use the copyrighted EPE. In addition to the basic information, I would like it if our bot were able to calculate surface gravity where possible based on mass/(r^2). Thanks, Wer900talk 18:08, 27 June 2013 (UTC)

We could import these data into WP:WD directly: please refer to the Space task force. --Ricordisamoa 18:34, 27 June 2013 (UTC)
I commented at Astronomy task force because it seems most appropriate. Wer900talk 20:56, 27 June 2013 (UTC)

Bot to assist in identifying articles most in need of cleanup.

Hi. I haven't historically been a big editor on Wikipedia. Though I use it from time to time. I realize that there are probably a number of bots at work using various methods to target entries for improvement. However, I just wanted to add my two cents on a method which may or may not be in use.

First, however, some quick background. I am currently taking a Data Science class and for one of my assignments I developed a script which selects a random Wikipedia article and does the following:

1) Counts total words and total sources (word count does not include 'organizational' sections such as References, External Links etc.

2) Uses all words contributing to the word count to assess the overall sentiment of the text. For this, I used the AFINN dictionary and the word count to get an average sentiment score per word.

3) For each section and sub-section (h2 and h3) in the page which is not organizational in nature (see above definition) counts the number of words, citations and as with item 2 gets a sentiment score for the section/sub-section

So my thought on using this script is as follows:

If it was used to score a large number of Wikipedia pages, we could come up with some parameters on which a page and its sections and subsections could be scored.

1) For all articles, word count, source count and sentiment score. 2) For all sections and sub-sections, word count, citation count and sentiment score. 3) For pages with sources, a sources per word score 4) For sections with citations, a words per citation score

For all of these parameters, the scores from the sample could be used to determine what sort of statistical distribution they follow. A bot could then scan the full set of wikipedia articles and flag those which are beyond some sort of tolerance limit.

Additionally, data could be collected for sections which commonly occur (Early Life, Private Life, Criticisms etc.) to establish expected distributions for those specific section types. For example, we might expect the sections labeled Criticisms would, on average, have a more negative sentiment than other sections.

I hope this all makes sense and perhaps some or all of it is being done. I look forward to hearing from some more experienced Wikipedians on the idea. Additionally, for sections which com — Preceding unsigned comment added by 64.134.190.157 (talk) 18:46, 21 June 2013 (UTC)

It's an interesting thought, but we have already tagged over a million articles with unresolved issues - see Wikipedia:Backlog. GoingBatty (talk) 02:57, 22 June 2013 (UTC)
Thanks for the reply. Let me ask it this way. Given that there already exists a backlog of that size. Would a bot such as I've described be useful in terms of prioritizing them for clean-up? For example, is an unsourced entry about a living person where a bot detects indications of bias more important to clean up than a seemingly neutral unsourced entry giving soccer results from the 1970s? I'm not asking this to be sarcastic. I honestly don't know if there is any sense that clean up of one should take priority over the other. — Preceding unsigned comment added by 72.244.42.10 (talk) 14:07, 28 June 2013 (UTC)
I think we already do this - the former has {{BLP unsourced}}, while the latter has {{unreferenced}}. GoingBatty (talk) 16:47, 29 June 2013 (UTC)

Template:South Alexandria

Put {{South Alexandria}} on every article listed on it, and put the listed articles in a category called Category:South Alexandria, Virginia. Emmette Hernandez Coleman (talk) 09:20, 29 June 2013 (UTC)

 Doing... manually. Also created the category, and added it to the template. GoingBatty (talk) 16:34, 29 June 2013 (UTC)
 Done. GoingBatty (talk) 16:42, 29 June 2013 (UTC)

NFUR images with NFUR but no license

Any chance someone could write a bot to license tag these quickly?

[6=1&templates_no=db-i3&sortby=uploaddate&ext_image_data=1&file_usage_data=1]

Most seem to be reasonably straightforward Sfan00 IMG (talk) 16:20, 18 June 2013 (UTC)

@Sfan00 IMG: Just to be clear, what exactly should be added?  Hazard-SJ  ✈  03:12, 20 June 2013 (UTC)
If the first template listed below is found on an image but the second is NOT found, add the second

etc... There may be some others, but this will start to clear out the 5000 or so I've found on the catscan query notes over on WP:AWB. Sfan00 IMG (talk) 10:14, 23 June 2013 (UTC)

BRFA filed  Hazard-SJ  ✈  21:32, 26 June 2013 (UTC)
It seems Theopolisme already plans to do this, so I withdrew the BRFA.  Hazard-SJ  ✈  01:48, 28 June 2013 (UTC)
That's generating the NFUR's themseleves , It's not adding the license tags for 'existing' media with rationales.Sfan00 IMG (talk) 07:01, 28 June 2013 (UTC)
I "unwithdrew".  Hazard-SJ  ✈  21:31, 1 July 2013 (UTC)

COI Template

Back in March there was consensus in Proposals to test out Template:COI editnotice on the Talk page of articles about extant organizations, to see if it increases the use of {{Request edit}} and reduces COI editing. A bot was approved in April to apply the template to Category:Companies based in Idaho. The BOT request said it would effect 1,000+ articles, which would be enough of a sample to test, but it looks like it was only applied to about 40 articles? I am unsure if the bot was never run on the entire category, or if we need a larger category. The original bot-runner is now retired. Any help would be appreciated. CorporateM (Talk) 14:12, 30 June 2013 (UTC)

Hey CorporateM, I'm taking over the tasks of the original (now retired) bot operator, and this is included. Sorry for the delay, Theopolisme (talk) 20:47, 30 June 2013 (UTC)
No problem. CorporateM (Talk) 21:33, 30 June 2013 (UTC)

A bot to tag articles without images

I know that previously PhotoCatBot has tagged articles fitting this criteria, but is there any way that we could get a new bot to help finish up where this bot left off almost three years ago? Thanks! Kevin Rutherford (talk) 02:16, 1 July 2013 (UTC)

For those interested, the Python source code can be found here. --Ricordisamoa 02:42, 1 July 2013 (UTC)
This request is similar to this one at itwiki (discussion). I created this script (guide). --Ricordisamoa 02:42, 1 July 2013 (UTC)
By the looks of the documentation of the task and code, it seems that bot only changed such tags (adding parameters), and not added them. Is that what you want?  Hazard-SJ  ✈  21:30, 1 July 2013 (UTC)
Possibly, although it would be good to have a bot that tags ones without images as well, so that we can create more up-to-date maps for articles without images, and make it easier to gather photos at the end of the day. Kevin Rutherford (talk) 17:05, 8 July 2013 (UTC)

Template:Arlington County, Virginia

Make sure all articles listed on {{Arlington County, Virginia}} have the template, and are in Category:Neighborhoods in Arlington County, Virginia.

Create redirects to these articles in the format "X, Virginia" and "X, Arlington, Virginia" and "X". For example all of the following should redirect to Columbia Forest Historic District: Columbia Forest, Virginia, Columbia Forest, Arlington, Virginia, and Columbia Forest, Columbia Forest Historic District, Virginia and Columbia Forest Historic District, Arlington, Virginia. Emmette Hernandez Coleman (talk) 10:41, 3 July 2013 (UTC)

Coding... Theopolisme (talk) 14:35, 3 July 2013 (UTC)
I've done the second part of your task on my own account, since there were only 35 edits in total (of course requiring human intervention before saving each edit). Going to tackle the first part now. Theopolisme (talk) 23:39, 3 July 2013 (UTC)
Status update: harder than it looks. I'm actually going all out and working on a full Python script for inserting and removing text in specific places (for example, navigational templates or categories). So, in progress -- source so far Theopolisme (talk) 04:11, 7 July 2013 (UTC)

Invitation Bot?

Howdy, I haven't had the need for a bot before, but I'm organizing a meetup and would like help in posting invites to the folks on this list. I can come up with a short message, is that all I need to provide? Or is there anymore info needed? Thanks, Olegkagan (talk) 00:44, 27 June 2013 (UTC)

I am willing to do this, but since I haven't been approved for this task before, I'll file a BRFA now.  Hazard-SJ  ✈  00:50, 27 June 2013 (UTC)
BRFA filed  Hazard-SJ  ✈  00:55, 27 June 2013 (UTC)
Umm, User:EdwardsBot/Instructions... Legoktm (talk) 01:51, 27 June 2013 (UTC)
He doesn't have access.  Hazard-SJ  ✈  02:01, 27 June 2013 (UTC)
He can ask ;) Legoktm (talk) 02:02, 27 June 2013 (UTC)
Is it my turn to do something now? Olegkagan (talk) 17:56, 27 June 2013 (UTC)
Well, you could use the link that Legoktm provided to use a bot that is already approved for such tasks (you would either have to ask an admin to add you to the access list, or to add himself and submit your request, or ask someone who is already on it so submit the job for you), or I could continue getting approval for my bot to run this task and do it for you. I'm leaving that decision with you.  Hazard-SJ  ✈  23:57, 2 July 2013 (UTC)
Seems to me that since you volunteered, are ready and willing to help, it makes sense for you to continue getting approval for your bot to carry out the task. Olegkagan (talk) 01:39, 3 July 2013 (UTC)

Okay, and it would be nice to have the message to be added in case a trial is requested of me, thanks.  Hazard-SJ  ✈  02:19, 3 July 2013 (UTC)

@Olegkagan: A 50-edit trial was requested of me 4 days ago, could I please get the message? Thanks.  Hazard-SJ  ✈  03:56, 7 July 2013 (UTC)
Pardon the delay. Here is the message: "You are invited to "Come Edit Wikipedia!" at the West Hollywood Library on Saturday, July 27th, 2013. There will be coffee, cookies, and good times! -- Olegkagan (talk)"
Thanks.  Hazard-SJ  ✈  04:29, 11 July 2013 (UTC)

Infobox Unternehmen

Please could somebody "Subst:" all 84 article-space transclusions of the German-language {{Infobox Unternehmen}}, which is now a wrapper for the English-language {{Infobox company}}, as in this edit? That may be a job for AWB, which unfortunately I can't use on my small-screen netbook. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:08, 10 July 2013 (UTC)

 Doing... Theopolisme (talk) 16:50, 10 July 2013 (UTC)
Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 July 2013 (UTC)
 Done, and I wrote a little Python script in case anyone needs to do something like this in the future [8] Theopolisme (talk) 21:28, 10 July 2013 (UTC)

Template:Peconic County, New York

Removes all articles in {{Peconic County, New York}} that are NOT in the following categories: Category:Sag Harbor, New York, Category:Riverhead (town), New York, Category:Shelter Island (town), New York, Category:Southampton (town), New York, Category:Southold, New York. Emmette Hernandez Coleman (talk) 04:13, 12 July 2013 (UTC)

Or arterially remove all articles that ARE in the flowing categories: Category:Babylon (town), New York, Category:Brookhaven, New York, Category:Huntington, New York, Category:Islip (town), New York, Category:Smithtown, New York. Emmette Hernandez Coleman (talk) 04:21, 12 July 2013 (UTC)

Never-mind. The template might be deleted, so no point in putting that effort into it until we know it will be kept. Emmette Hernandez Coleman (talk) 09:08, 13 July 2013 (UTC)

Book report bot

Given NoomBot is under since April, maybe another bot could be made to make\update the ever-useful Book Reports. igordebraga 01:31, 13 July 2013 (UTC)

Template:Orleans Parish, Louisiana

Add {{Orleans Parish, Louisiana}} to every page it lists. Emmette Hernandez Coleman (talk) 22:33, 13 July 2013 (UTC)

@Theopolisme: would you be interested in doing this after #Template:Arlington County, Virginia?  Hazard-SJ  ✈  03:08, 14 July 2013 (UTC)
Yep! Basically, the script I'm writing will just need to scan every template on the page and define each one based on its contents/its documentation's contents -- i.e., "navbox", "persondata", "stub tag", etc., then assemble a "map" of the page's templates based on type...then the bot can insert elements at the end of the closest section (i.e., if inserting authority control, and the article has no geographical coordinates, insert after the navigation templates), per WP:ORDER. I'll hack on this tomorrow. Theopolisme (talk) 03:21, 14 July 2013 (UTC)

Also {{Neighborhoods of Denver}}. At the rate I'm going I'll probably create a few more navboxes in the next few days, so it would be easier to do a bunch of navboxes together don't bother with either of these yet. Emmette Hernandez Coleman (talk) 08:20, 14 July 2013 (UTC)

Poem dates

I've just added an hCalendar microformat to {{Infobox poem}}, so a Bot is now required, to apply {{Start date}} to the |publication_date= parameter.

The logic should be:

  1. If the value is empty, do nothing
  2. Else if the value is four digits, YYYY, change to {{Start date|YYYY}} (example 1)
  3. Else if the value is a month and year, change to {{Start date|YYYY|MM}}, where "MM" is the number of the month in the year (07 for July, etc).
  4. Else if the value is a day, month and year, DD Month YYYY, change to {{Start date|YYYY|MM|DD|df=y}}
  5. Else if the value is a month, day and year, Month DD, YYYY, change to {{Start date|YYYY|MM|DD}}
  6. Else add a hidden tracking category, for human investigation.

This is related to a larger request with clear community consensus, which has not yet been actioned; I'm hoping that this smaller, more manageable task, will attract a response, which can later be applied elsewhere. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:12, 14 July 2013 (UTC)

Gotcha. I'll implement this using my favorite python date parsin' module, dateutil (@Hazard-SJ: as an aside, it's really quite powerful, if you haven't used it before). Theopolisme (talk) 14:30, 14 July 2013 (UTC)
Source code is written (not exactly following your logic, but accomplishes basically the same thing); Andy, is the df=y especially important? At the moment, I haven't implemented it, since according to my tests it would slow down the script a fair bit. Theopolisme (talk) 17:09, 14 July 2013 (UTC)
Yes. People get very upset if you change 14 July 2013 to July 14, 2013, or vice versa. And thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:30, 14 July 2013 (UTC)
Very true. I thought the template would be magical enough to detect that on its own, but I guess that's too much to hope for ;) Coding that functionality now. Theopolisme (talk) 18:14, 14 July 2013 (UTC)
Implemented; BRFA to follow. Theopolisme (talk) 18:50, 14 July 2013 (UTC)
Wikipedia:Bots/Requests_for_approval/Theo's_Little_Bot_24 Theopolisme (talk) 19:23, 14 July 2013 (UTC)

New adminbot

I would like to suggest a bot that disables autoblock on softblocks - if an admin accidentally enables autoblock on a softblock, my suggested bot will automatically DISable autoblock and enable account creation, e-mail, and talk page access if any, some or all of them are disabled. The bot will then lift the autoblock on the IP address affected by the soft-blocked user. 76.226.117.87 (talk) 03:09, 15 July 2013 (UTC)

How is the bot supposed to know when it is softblocked?—cyberpower ChatOnline 12:04, 15 July 2013 (UTC)
(edit conflict) As written, I can't make sense of your request. According to WP:SOFTBLOCK, a "softblock" is a block on an IP address, and the "autoblock" option is not available when blocking IP addresses. Anomie 12:10, 15 July 2013 (UTC)
I think he means accounts that are blocked where the autoblocks should be disabled. I'm going to go ahead and mark this as Impossible.—cyberpower ChatOnline 12:19, 15 July 2013 (UTC)

Fix-up of new bot

I already said that I wanted a bot to change block settings (see the declined request made by 76.226.117.87), but now I have fixed it:

The process goes like this: 1. An administrator ENables autoblock when blocking an account that should have autoblock DISabled. 2. The bot I'm suggesting will change the block settings for the affected account so that account creation, e-mail, and talk page access are all allowed, if any of them are disabled. The bot will also disable autoblock on the affected account, and lift the autoblock on the IP address recently used by the blocked account. 3. The resulting log entry should look something like this (timestamps will vary, and some extra comments are added that are not normally present in block log entries):

  • 13:13, 13 April 2011 User 1 (talk | contribs) blocked User 2 (talk | contribs) (account creation blocked, email disabled) (sample settings) with an expiry time of indefinite ({{uw-softerblock}}, and {{softerblock}}, although may be {{uw-causeblock}}, {{causeblock}}, etc. )
  • 13:14, 13 April 2011 (New bot's username) (talk | contribs) changed block settings for User 2 (talk | contribs) with an expiry time of indefinite (autoblock disabled) (The reason for the block)
  • 13:14, 13 April 2011 (New bot's username) (talk | contribs) unblocked #xxxxxxx (autoblock number will vary) (Blocks like these should not have autoblock enabled. ) 76.226.76.230 (talk) 20:54, 15 July 2013 (UTC)
Still Impossiblecyberpower ChatOnline 22:04, 15 July 2013 (UTC)
No it's not completely impossible. But it requires an assumption that blocks with reasons referencing certain templates such as "{{uw-softerblock}}" must never be hardblocks, consensus for which should probably be established on WP:AN (and advertised to WT:BLOCK and those templates' talk pages) first.
On the other hand, the part of the request to automatically unblock IPs blocked by the mistaken hardblock is impossible for privacy reasons, as watching the bot's block/unblock log would effectively reveal that the originally-blocked user had used the unblocked IP address. Anomie 22:16, 15 July 2013 (UTC)
I just see too much room for error. And the second half of your statement is exactly why I considered it impossible to do.—cyberpower ChatOnline 23:20, 15 July 2013 (UTC)

SpellingBot?

I thought of a bot that corrected simple spelling errors, such as beacuse-because and teh-the. buffbills7701 22:17, 15 July 2013 (UTC)

See WP:Bots/Frequently denied bots#Fully automatic spell-checking bots. Anomie 22:20, 15 July 2013 (UTC)
N Not donecyberpower ChatOnline 23:22, 15 July 2013 (UTC)
However, you could use tools such as AutoWikiBrowser to search for and fix spelling errors such as these, as long as you check each edit for incorrect fixes before saving. GoingBatty (talk) 03:18, 16 July 2013 (UTC)

Bot tagging medical articles that lack a PMID or DOI

Context: Citing references is important for medical articles (under WP:MED or related WikiProjects) though it is important for other articles as well. As books have ISBN, almost all the renowned medical journals have Pubmed listing and their articles bear a PMID. Pubmed serves as a global quality regulatory and enlisting body for medical articles and if a medical article does not have a PMID, chances are that the journal is not a popular one and therefrore there is a possibiliy that it does not maintain quality issues that are to be adhered to. Other medical articles have Digital object identifier or DOI (with or without having PMID) which serves to provide a permanent link redirecting to its present url. Some Pubmed articles are freely accessible and have PMC (alongside PMID) which is therefore an optional parameter. Thus, if a <ref></ref> is having neither PMID nor DOI, chances ar that 1. the article has a PMID (most cases) but the <ref></ref> tag lacks its mention or 2. The article lacks a PMID or DOI and its not a problem of the <ref></ref> placed.

I feel the requirement for two different bots.

  1. A bot automatically crawling the pages tagged under WP:MED, WP:Anatomy, WP:Neuroscience, WP:MCB, WP:Pharmacology, WP:Psychology and certain other related WikiProjects, for the articles that have references having neither PMID nor DOI and adding a tag within the reftag such that it adds the Wikipedia article to a browsable list. An easier option for the bot would be to check for the journal (name may be full or abbreviated) in the Pubmed directory and if it is in the list but the <ref></ref> is having neither PMID nor DOI, the tag is to be placed to denote the possibilty that the aticle has a PMID that has not been added to the <ref></ref> tag. If the bot cannot locate the journal in the Pubmed database, it would place a tag something like 'journal not found in Pubmed db'. There should be a modifiable parameter which can be manually checked on or off by some person (user) to affirm or negate the bot.
  1. A bot automatically tagging pages (criteria above) not using a {{cite journal}} template. for the <ref></ref> tag.

Utility: These bots would enable the editors of medical articles to make the mentioned references more reliable and verifiable and would encourage users to use this template while placing references. DiptanshuTalk 16:15, 10 July 2013 (UTC)

You can proceed with the discussion about utility of such bots at Wikipedia talk:WikiProject Medicine#Bot tagging medical articles that lack a PMID or DOI DiptanshuTalk 16:25, 10 July 2013 (UTC)

Small request for bot/script to replace "language=ru" with "language=Russian"

The folks at the Village Pump suggested that I post this request here.

I have been correcting errors in citations, and I have noticed that pretty much every Russia-related article I edit contains an incorrectly formatted parameter, "language=ru", in its citation parameters. (The "language" parameter takes the full language name, not the two-letter code.)

You can see an example of one of these citations here. Note that the rendered reference says "(in ru)" instead of the correct "(in Russian)".

It occurred to me that someone clever with a script or bot or similar creature might be able to do a semi-automated find and replace of "language=ru" in citations with "language=Russian".

Is this a good place to request such a thing? I do not have the time or skills to take on such a project myself. Thanks. Jonesey95 (talk) 14:06, 16 July 2013 (UTC)

I spotted this with some other languages when doing some capitalisation of language names so may be worth expanding the scope to other 2 character language identifiers. Keith D (talk) 14:16, 16 July 2013 (UTC)

In the pump thread, using Lua to automatically parse the language codes was suggested -- I think that would definitely be preferable if possible, rather than having a bot make a boatload of fairly trivial edits. Theopolisme (talk) 14:27, 16 July 2013 (UTC)

Yes, a change to Lua would be preferable, but it would probably not fix the problem described above. I think a human or bot would still need to go through and make this boatload of changes. Until Lua changes, I'm fixing these errors by hand; a bot would do a much quicker job and leave me to make changes that require a human brain. We already have bots and AWB users fixing things that are not visible to readers; this change would actually improve the reader's article-reading experience, making it considerably less trivial than those AWB changes. Jonesey95 (talk) 19:41, 16 July 2013 (UTC)
Original discussion is at WP:VPT#Small request for bot/script to replace "language=ru" with "language=Russian". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:16, 16 July 2013 (UTC)
I may be a bit slow, but is there a reason the template couldn't be changed so that lang=ru produces (in Russian)? Or is that what the bit above eye: Lua is about? Ignatzmicetalk|
Yes; that's partly what's meant by "using Lua to automatically parse the language codes", above. Such codes could also be used to put (non visible) language attributes into the emitted HTML, to improve compliance with web standards and accessibility guidelines, and improve searchability. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 06:41, 19 July 2013 (UTC)

Bot to tag references behind a paywall

Wikipedia is too important and too useful of a resource to have citations behind paywalls if there is another possible reference. In order to draw attention to references that need improving/substitution it would be nice if there was a bot that would tag articles that are behind a paywall. I realize that some newspapers slowly roll articles behind a paywall as time passes. However other newspapers have all their content behind a paywall. A good example is The Sunday Times. You can click on any link on http://www.thesundaytimes.co.uk and you will be presented with "your preview of the sunday times." If wikipedians like myself enjoy contributing by verifying citations it is next we cant verify sunday times citations for free. When I see a paywall tagged citation I often try to find another citation and substitute it. A bot would be helpful for this. DouglasCalvert (talk) 23:51, 16 July 2013 (UTC)

I'm willing to do this, but having a list of other such links would be useful. Could you come up with them?  Hazard-SJ  ✈  00:07, 17 July 2013 (UTC)
I think this would send the wrong message. First, we want high quality sources, even if they are not free. Only if a free source is of equal or better quality to a non-free source should the free source be preferred. By having a bot going around tagging thousands of paywall sources, it will reinforce the misconception that paywall sources are not acceptable.
Also, why is it fair to tag paywall sources but not paper books, magazines, and newspapers? Jc3s5h (talk) 00:17, 17 July 2013 (UTC)
How is this sending the wrong message? If an article is behind a paywall that is a fact. How can that send the wrong message? I am not saying I think they should be removed or that lower quality citations should be substituted. Thats a nice strawman but I am not going to engage. I think the reference should indicate that the link is behind a paywall for two reasons. Most importantly it lets other editors know that the article could be imporived if an equivalent reference was found from a site that is not hiding behnd a paywall. Secondly there is no point sending readers off to another site if they will not be able to read the reference.
As far as "fairness" goes I do not even know what it means to be fair to a book or to be fair to a magazine. If you have a problem with the paywall tag there seems like there must be another avenue to voice your concerns. I can go to a library and verify the citation for free. I cannot go to the library and verify the sunday times citation.DouglasCalvert (talk) 00:26, 17 July 2013 (UTC)
It would make the FUTON bias problem worse. I think this Needs wider discussion.; post about it at WP:VPR and see if you can get consensus first. Anomie 01:22, 17 July 2013 (UTC)
The {{subscription}} template is a useful and neutral tag to apply after citations that contain subscription-only URLs. Jonesey95 (talk) 03:01, 17 July 2013 (UTC)
Yes, the {{Subscription required}} template seems very fitted to be added by a bot. - Jamesmcmahon0 (talk) 00:21, 22 July 2013 (UTC)

VisualEditor FixBot?

WMF has turned Visual Editor on for IP accounts now, and the results are as expected: Filter 550 shows that a significant volume of articles are getting mutilated with stray wikitext. It has been proposed to set the filter to block the edits that include nowikis that indicate that the combination of VE and an unknowing editor has caused a problem.

I'd personally rather see this sort of mess cleaned up by a bot, and a message left on a user talk page that asks the editor either to stop using wiki markup or to stop using VE. I think it's possible to detect strings generated by VE (basically, it surrounds wikitext with nowiki tags), and figure out what the fix is (in some [most?] cases, just remove the nowiki tags), similar to how User:BracketBot figures out that an edit has broken syntax. Given that the problem is on the order of magnitude of 300 erroneous edits per day, is it possible to move with all deliberate pace to field such a bot?

(Background: see WP:VPR#Filter 550 should disallow.) -- John Broughton (♫♫) 03:31, 16 July 2013 (UTC)

Sounds interesting. I'll look into solutions.—cyberpower ChatOffline 03:36, 16 July 2013 (UTC)
(edit conflict) Hmm, I like this idea too...looking into... Theopolisme (talk) 03:39, 16 July 2013 (UTC)
I started a script, but I won't be able to finish it up now as I have to go offline.  Hazard-SJ  ✈  07:51, 16 July 2013 (UTC)
For a bot rather than script-assisted human editing, watch out for the issues raised in WP:CONTEXTBOT. Anomie 10:55, 16 July 2013 (UTC)
Good point; I'll also post at a couple of automated tool pages (AWB, TW). But I really think that false positives on articlespace pages are very unlikely for cases like this:
<nowiki>[[whatever]]</nowiki>
And simply removing nowiki tags is a change that seems to me to have so little potential for damage that I think a bot - with the standard human occasional review - can be trusted to do that. -- John Broughton (♫♫) 15:24, 16 July 2013 (UTC)
I can't see how any false positives would result - I can't think of any occasions where nowiki should appear in articles..... Mdann52 (talk) 12:54, 17 July 2013 (UTC)
From the filter results, I've seen some experienced users use nowiki as a method of escaping wikitext characters in places where it is intended to be used as plain text. For example, someone might write:
{{ template | title = Pipe Land <nowiki> | </nowiki> Pipes }}
That's not the typical way of dealing with things like that, but it does work. Dragons flight (talk) 13:00, 17 July 2013 (UTC)
For people designing scripts, one needs to be aware that when VE adds nowiki it tends to be maximally expansive rather than minimally so. For example if you type:
I like [[butterflies]] and bright red trains.
In VE the result is:
<nowiki>I like [[butterflies]] and bright red trains.</nowiki>
Rather than just escaping the link, it will escape all of the added plain text on either side as well. Dragons flight (talk) 13:05, 17 July 2013 (UTC)
I will try to do something in WPCleaner about this, as requested by John on my talk page, but not sure I will have the time today or tomorrow. I see the following (won't be automatic) features :
  • Retrieving the list of articles which triggered filter 550
  • Detect <nowiki>...</nowiki> tags in main namespace articles, and suggest a fix for each of them. Basically suggesting to just remove the tags, except for specific cases. I've only one in mind for now : the nowiki at the beginning of a line with whitespace characters after it, the whitespace characters should be removed too.
--NicoV (Talk on frwiki) 14:26, 17 July 2013 (UTC)
WPCleaner can now detect <nowiki>...</nowiki> in main namespace and suggest a fix. To active this detection, edit Special:MyPage/WikiCleanerConfiguration and add the following contents (with the <source>...</source> tags):
# Configuration for error 518: nowiki tags
error_518_bot_enwiki=true END
After that, in WPCleaner the Abuse filters button lets you choose which Abuse filter you are interested in (choose 550) and gives you the list of pages having triggered that filter. When you analyze a page, <nowiki>...</nowiki> tags are found and suggestions are given to fix them. It's quite basic, so if you think of any enhancement, tell me. --NicoV (Talk on frwiki) 22:52, 17 July 2013 (UTC)
Great; thanks Nico! Theopolisme (talk) 00:36, 18 July 2013 (UTC)
Yes, that should help out. Doing it automatically, there's not much that can be done to avoid false positives, so it'd need great consensus, per Anomie's link above.  Hazard-SJ  ✈  00:59, 18 July 2013 (UTC)
If I have time tonight, I will sort the list of pages where an edit triggered Filter 550 from newest to oldest instead of alphabetically. --NicoV (Talk on frwiki) 07:14, 18 July 2013 (UTC)

I came here from the WP:Village Pump (proposals) page, and I suggest instead of an autofix bot, maybe a bot much like User:DPL bot? They could notify everyone who accidentally triggered the filter and each person could go back and fix it. Unless that would create lots of spam? Just a thought. kikichugirl inquire 22:21, 20 July 2013 (UTC)

@Kikichugirl: Presuming that VE is aimed at new editors, I'm not sure they would understand a talk page message that tried to explain why VE messed up their edit. Plus, IP editors who get a new IP address every time they edit may never see the messages. GoingBatty (talk) 00:47, 22 July 2013 (UTC)
@GoingBatty: That's actually a good point. Besides, User:DPL bot I've heard is meant for at least slightly more experienced editors anyway. Most of the nowiki problems I've seen are coming from users with redlinked userpages (likely to be newer editors; more experienced users not choosing to have a userpage would just redirect their user page to their talk page). kikichugirl inquire 06:26, 22 July 2013 (UTC)
I see..... Now, just because they won't probably doesn't mean we cant, I imagine there would be a way to hook into the save of visual editor and run a collection of javascript rules / tidy up scripts over the page before actually saving? :> Or is my mind beginning to wonder? ·addshore· talk to me! 07:37, 22 July 2013 (UTC)
Hmm, that's not a bad idea :) Theopolisme (talk) 16:39, 22 July 2013 (UTC)

It seems to me that VE is leading to a rise in external links [9] in article text, as refs. Can an autobot move the link to refs or EL with an edit note for a human to follow-up? Thanks. Alanscottwalker (talk) 10:48, 23 July 2013 (UTC)

I think it would be very difficult for a bot to determine, consistently and accurately, where a link belongs, to be honest. Theopolisme (talk) 16:55, 23 July 2013 (UTC)

Flag unreferenced BLPs for WikiProjects

DASHBot (talk · contribs) used to create, and/or periodically update, a list of unreferenced biographies of living persons for a given Wikiproject (see User:DASHBot/Wikiprojects). However, that bot has been blocked since March. I'm wondering if another one can accomplish this same task. I'm asking on behalf of WP:JAZZ (whose list is at Wikipedia:WikiProject Jazz/Unreferenced BLPs) but there were a lot of other WikiProjects on that list, as well (I'd already removed WP:JAZZ, though). -- Gyrofrog (talk) 21:55, 23 July 2013 (UTC)

I'll be happy to do this, but first I emailed DASHBot's operator, on the off chance that he'd be able to send me the source code for the task in question. If I don't receive a response in the next week or two, I'll re-code it myself. Theopolisme (talk) 12:26, 26 July 2013 (UTC)

I have noticed that some links contain Google Analytics tracking parameters. It seems that wikipedia links should not help companies track user clicks. I have even noticed some advert-like entries about companies contain links with custom GA campaign parameters to target clicks from wikipedia to the company page/bog/etc. Removing these GA tracking parameters seems like a great task for a bot. All google analytics parameters begin with utm. The basic parameters are:

  • Campaign Source (utm_source)
  • Campaign Medium (utm_medium)
  • Campaign Term (utm_term)
  • Campaign Content (utm_content)
  • Campaign Name (utm_campaign)

Does this sound doable?

DouglasCalvert (talk) 03:56, 13 July 2013 (UTC)

Hmm, it's definitely doable, and seems (to me) like quite a good idea. I'd be happy to code this. However, would there be any reason not to do so? (Maybe reach out to a relevant WikiProject -- WikiProject Spam?) Theopolisme (talk) 04:09, 13 July 2013 (UTC)
Thank you for your interest. I cannot think of any reason why wikipedia should assist in tracking user behavior on the internet but I am a little biased:) I have never submitted a bot request before. Is there anything else I need to do at this stage? DouglasCalvert (talk) 04:19, 13 July 2013 (UTC)
Douglas, I definitely share your sentiments. :) I will begin writing the code for the bot in the next few days and get back to you if I have any questions. Feel free to contact me if you can think of anything else that I might find useful, Theopolisme (talk) 04:22, 13 July 2013 (UTC)
Is there a place where I can search all links on wikipedia to collect data to include with this request and/or to help with the bot request for approval? I would like to search all external links for "utm". I could go through the links by hand and pull out examples. I thought saw a link for something like this on toolserver but I am having trouble finding the link now. The search external links tool seems to require a domain search term. Which is the opposite of what I want to search for.DouglasCalvert (talk) 04:25, 13 July 2013 (UTC)
@DouglasCalvert and Theopolisme: I jumped in with User:Hazard-Bot/Google Analytics report, if it helps.  Hazard-SJ  ✈  05:27, 13 July 2013 (UTC)
Thanks, Hazard! I was thinking about this last night -- would this bot technically be a WP:COSMETICBOT, since there would be no visible change for the reader? Theopolisme (talk) 13:58, 13 July 2013 (UTC)
Not quite, since it does change the target of links visible to the reader (if they click them). For many cases this would probably be too trivial to bother with, but the tracking parameters may impact privacy so I think it could be justified. Anomie 18:06, 13 July 2013 (UTC)
How does it change the target? The destination page is the same with or without the GA parameters. Many of the pages in the report hazard created (thanks hazard) contain canonical URLs. For instance:
Link with GA parameters: http://www.avclub.com/articles/the-most-serene-republic,30287/
link rel="canonical" href="http://www.avclub.com/articles/the-most-serene-republic-and-the-ever-expanding-un,30287/"
Any wiki reader that followed the link with the GA parameters would arrive at the same page as someone who visits the canonical URL. The only difference is that wikipedia is not feeding Google's analytics DB if the article provides the canonical URL in the reference section. DouglasCalvert (talk) 18:19, 13 July 2013 (UTC)
We should double check that. It is certainly possible, in principle, that a website could vary their landing page or other details based on the requested "campaign" or "source". I don't know if many websites do that, but I wouldn't rule it out either. I might be good to look at a few dozen of these with and without the GA data and make sure they aren't doing anything unexpected. Dragons flight (talk) 18:31, 13 July 2013 (UTC)
This brings up a related question: Should wikipedia be referencing sources that change content based on presence of user tracking parameters? My answer is obviously "NO." DouglasCalvert (talk) 18:55, 13 July 2013 (UTC)
For bland external links, I would tend to agree with you. For references, I would be inclined to leave the link in place until a replacement is found. However, I'm rather hoping that this is entirely academic and that nearly all links will return the same content with or without the UTM bit. I don't know if that is true, hence the desire to test a bit, but I'm rather hoping that almost no one differentiates their actual content based on UTM context. (There are possibly many sites that differentiate their advertising based on UTM context, but we can safely ignore that.) Dragons flight (talk) 19:44, 13 July 2013 (UTC)
Excellent idea. It should be possible to have an "Exclude bots" template to stop conversions where the technology is being discussed or demonstrated. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:18, 13 July 2013 (UTC)
Thanks to Anomie and Dragons flight for your clarifications thus far, and to Andy for your suggestion (which would definitely be implemented). Coding... because I have some time on my hands, but don't let that stop the discussion. I'll examine the links with and without advertising parameters as well. Theopolisme (talk) 21:06, 13 July 2013 (UTC)
It's safe to remove any utm_ parameter, correct? Theopolisme (talk) 21:56, 13 July 2013 (UTC)
Yes all the utm parameters are google analytics. If you do a google search for google analytics cheat sheet you should find a couple resources that list tons of them.
If the page lists a canonical URL you dont need to worry about content changing. You can just completely normalize the URL to what is listed as the pages canonical url. Quoting google: "Must the content on a set of pages be similar to the content on the canonical version? Answer: Yes." https://support.google.com/webmasters/answer/139394?hl=en
The above google link is an intro to canonical URL's. Google Webmaster's Blog describes common use of canonical URLs here: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
The nice thing with Google Analytics using websites is that they are going to be as google friendly as possible. Google wants people to use the canonical URLs. I would definitely use the canonical URL if available and if not then strip the utm parameters.
Please ignore this if it was old news to you. Just trying to be helpful. Once again thanks for stepping up.DouglasCalvert (talk) 22:13, 13 July 2013 (UTC)
No, it was very helpful, canonical urls are fabulous. I'm looking into parsing and utilizing them now. Theopolisme (talk) 22:38, 13 July 2013 (UTC)

I've written the basic program, and it successfully made this edit (incorporating the canonical url) -- if the canonical url isn't available, it'll just use regular expression to strip the utm_ parameters. Douglas, is this what you were looking for? Theopolisme (talk) 01:33, 14 July 2013 (UTC)

That is awesome! Thanks so much. If I knew any of the templates for the "you are a star/barnstar/botgod/etc" I would litter your talk page with them. I have no idea how the rest of the bot approval process works. I take it that you do? Is there a way I can help babysit your bot during testing and review the edits it makes? If that would be easier for you please tell me how I can help you. DouglasCalvert (talk) 17:06, 14 July 2013 (UTC)
Happy to do it, and you're very welcome. Yep, I'll start the request for approval process now and give you a link to the request once I've initialized it for you to follow along (Theo's Little Bot loves babysitters, you'll be welcome to do that as well). Theopolisme (talk) 17:15, 14 July 2013 (UTC)
Wikipedia:Bots/Requests_for_approval/Theo's_Little_Bot_23 Theopolisme (talk) 19:22, 14 July 2013 (UTC)

@Hazard-SJ: could I steal the source code that you used to generate that list of urls (I assume it involved a db replica somewhere or other, they're all basically the same *wink*)? That way I won't have to manually crawl every...single...page... Theopolisme (talk) 01:36, 14 July 2013 (UTC)

Certainly, here it is! :D  Hazard-SJ  ✈  03:11, 14 July 2013 (UTC)
Merci. Saved me ten minutes :) Theopolisme (talk) 03:22, 14 July 2013 (UTC)
De rien, saved your bot hours/days ;)  Hazard-SJ  ✈  03:36, 14 July 2013 (UTC)

Could the same task be useful in trimming cruft from Google Books links, like this edit? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:30, 17 July 2013 (UTC)

Anyone? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:12, 21 July 2013 (UTC)

Hey Andy: Sorry, I didn't see your post. That's definitely doable. What parameters would need to remain, though, and which would be okay to be stripped? page, for example, definitely needs to stay -- are there others? I wouldn't want to remove something that would alter the appearance of the page for the person who clicks on the link (besides, say, removing extraneous search terms and such). Theopolisme (talk) 00:35, 22 July 2013 (UTC)
NP. I'm, not sure what other parameters exist, perhaps someone at Help talk:Citation Style 1 can help? I'll ping them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:58, 27 July 2013 (UTC)
If we're talking about the url parameters, I usually remove everything except "id" and "pg". So far, that's always had the desired effect of displaying the Google Books page exactly the same, just without the highlighted search terms. DoctorKubla (talk) 16:25, 27 July 2013 (UTC)

Vandalism Tracker Robot

It seems like it would be of great use to create a bot that finds and lists the most vandalised pages on the wiki, and create a list article or essay that regularly updates the list, in order to alert all wikipedians to which pages require the most monitoring and reverting. Superwikiwalrus (talk) 14:12, 27 July 2013 (UTC)

That job should go to User:Cobicyberpower ChatOnline 14:51, 27 July 2013 (UTC)

GA Assessment BOT

Hi! This is my first bot request so bear with me. The bot I am requesting will do/perform the following functions:

  1. When a nomination is added to WP:GAN, it will download/scan/read/whatever the source code, checking for spelling errors
  2. It will check the source code for any maintenance tags, {{citation needed}} {{refimprove}} and so on
  3. It will check all images in the article and check fair use status
  4. It will then report all this somewhere on wiki, preferably with a page per nom, something like User:Retrolord/(Nomination name here report)

Does anyone here have any thoughts on the feasibility of this? KING RETROLORD 09:10, 22 July 2013 (UTC)

One more request, could the bot check there is at a minimum 1 citation per paragraph. Thanks, KING RETROLORD 09:59, 22 July 2013 (UTC)

Checking for spelling errors is generally not a good idea. However, the rest of this could be feasible... Theopolisme (talk) 16:38, 22 July 2013 (UTC)
Coding... a bot to create a report for each new nomination (which will be posted at User:Theo's Little Bot/GAN/**article**). Theopolisme (talk) 16:43, 22 July 2013 (UTC)
@Retrolord: Please take a look at User:Theo's Little Bot/GAN for what I've got so far (source code). Thoughts? Modifications? What other information would be useful to have? Theopolisme (talk) 02:07, 23 July 2013 (UTC)
That seems to have covered it. Would you be able to provide the list of tags it checks for?KING RETROLORD 02:20, 23 July 2013 (UTC)
Right now it checks for all tags in Category:Cleanup templates, although this can be changed if you'd like. Really, no other information you're dying to see? ;) Theopolisme (talk) 02:30, 23 July 2013 (UTC)
(edit conflict) It uses all templates from Category:Cleanup templates.  Hazard-SJ  ✈  02:30, 23 July 2013 (UTC)
Sir, hath thou forgotten thy pipe trick? Theopolisme (talk) 02:32, 23 July 2013 (UTC)
Nothing else i'm dying to see in the bot, but when its finished all the reports will be on seperate sub-pages? KING RETROLORD 02:35, 23 July 2013 (UTC)
@Theopolisme: - I agree that automatically fixing spelling errors is not a good idea. However, I'm curious what your concerns would be when it comes to making a list of potential spelling errors in an article and posting that list elsewhere for human review. GoingBatty (talk) 02:44, 23 July 2013 (UTC)
Ah, yes, that's a different story. I'm not opposed to that, theoretically... perhaps just a section, "Alerts", saying that it detected "x misspelled words, including x,y,z", and advising the reviewer to run the page through a proper spellchecker? I wouldn't see the point in just blatantly listing all the misspelled words, though. :/ Theopolisme (talk) 02:55, 23 July 2013 (UTC)
A few more things actually. Could we have the bot check for 1 citation per paragraph? And secondly, on the report pages, could all none free images be written in red font, so they stand out more? Thanks, KING RETROLORD 03:07, 23 July 2013 (UTC)
Yes and yes to those two, implementing now. Theopolisme (talk) 03:13, 23 July 2013 (UTC)
 Done, I think. See User:Theo's Little Bot/GAN. For the alerts, I a) ignore the lead (since it doesn't have to be referenced) and b) ignore paragraphs that match the following regular expression: (==|Category:|\[\[File:|\[\[Image:|{{Authority control|{{.*}}|^<.*?>|^;|^\|). It's still not completely foolproof, though, and still gets some false positives. Theopolisme (talk) 04:10, 23 July 2013 (UTC)
This is coming along quite nicely. A few final questions, is the bot going to do a report on all current noms? And where are the reports going to end up? (At the moment I can only see the 10 or so on that example page?) Thanks, KING RETROLORD 04:57, 23 July 2013 (UTC)

Yes, when running "for real" the bot will report on all current nominations. As far as where the reports end up... I would like to just use sections on User:Theo's Little Bot/GAN, since that prevents having to creating a ton of new pages—unless you have a reason why multiple pages would be beneficial. Theopolisme (talk) 05:09, 23 July 2013 (UTC)

There are two reasons why I thought seperate pages would be good. Firstly, will reports on 400+ noms make the page unwieldy to load? And secondly, if there are seperate pages for each nom, then they could be linked at the WP:GAN page next to each nom. Thoughts? KING RETROLORD 05:34, 23 July 2013 (UTC)
Links could be done using User:Theo's Little Bot/GAN#articlename, so that's not a big deal. Load-wise: perhaps User:Theo's Little Bot/GAN/A, User:Theo's Little Bot/GAN/B, etc? Theopolisme (talk) 05:42, 23 July 2013 (UTC)
Sounds good to me. I don't really mind, as long as you don't think there will be problems loading the page it shouldn't matter. Thanks, KING RETROLORD 05:44, 23 July 2013 (UTC)
I've implemented the /A, /B, etc system. {{User:Theo's Little Bot/GAN/link}} can be used to automatically link to a specific article's listing. Theopolisme (talk) 06:15, 23 July 2013 (UTC)

@GoingBatty: I've implemented basic spell checking using Wikipedia:Lists of common misspellings/For machines (commit). I initially tried using a larger corpus (courtesy of NLTK+Project Gutenberg), but it was taking way too long to process each article (5-8 minutes), so I settled for Wikipedia:Lists of common misspellings/For machines instead. It's not as complete, but should still catch "common misspellings." ;) Your thoughts? Is this adequate? Theopolisme (talk) 19:17, 23 July 2013 (UTC)

@Theopolisme: - Would you be able to leverage WP:AWB/T ? GoingBatty (talk) 23:01, 23 July 2013 (UTC)
Ah, good idea. Will look into tonight. Theopolisme (talk) 23:55, 23 July 2013 (UTC)
Once again, great idea! Code for integration with WP:AWB/T is written; running some tests now. Theopolisme (talk) 04:40, 24 July 2013 (UTC)
@GoingBatty: Take a look at User:Theo's Little Bot/GAN -- right now, I'm just printing the typo + line #, but would it make sense to also print the suggested correction? Theopolisme (talk) 04:46, 24 July 2013 (UTC)
@Theopolisme: - Thanks for adding the typo info! I suggest you change the description from "Common" to "Possible", and agree that the suggested correction be added. How would you suggest the editor use the line # info? Thanks! GoingBatty (talk) 02:02, 25 July 2013 (UTC)
2 AM and 2 hours of frustration later, I think I've got the "suggested correction" stuff working (note to self *growls*). Take a look at User:Theo's Little Bot/GAN. Good point about the line #s...you're right, they are fairly useless. ;) Perhaps printing a snippet of the surrounding text? Theopolisme (talk) 06:46, 25 July 2013 (UTC)
I've made it print the surrounding 40 characters instead. Thoughts? Theopolisme (talk) 07:05, 25 July 2013 (UTC)
Request: Could the sub headings for each article at User:Theo's Little Bot/GAN link to the articles? So the section headers would become bluelinks? Thanks, King∽~Retrolord 05:30, 24 July 2013 (UTC)
Done in source code here. Theopolisme (talk) 06:19, 24 July 2013 (UTC)

@Theopolisme: Can this bot be used to scan current Good Articles? The bot might be able to select some articles that might no-longer meet the GA criteria for examination by human users. If the user decides it no-longer meets the criteria, he can open a GAR.--FutureTrillionaire (talk) 01:35, 25 July 2013 (UTC)

The bot might be able to select some articles that might no-longer meet the GA criteria for examination by human users. Hmm, what do you think should trigger the bot to select a particular article for reexamination? Theopolisme (talk) 01:56, 25 July 2013 (UTC)
Regarding the above idea, wouldn't running it through the same procedure that nominations get work? It might not pick up every article needing re-assessment but it will fairly easily spot all the ones with maintenance tags, non free images and lack of citations? King•Retrolord 06:58, 25 July 2013 (UTC)
Well, yes, but it is a machine, so exact constraints would need to be specified (for example, how many maintenance tags == re-review?). Also, keep in mind that the bot's reports include a fair number of false positives. I think it's just a matter of determining numbers that wouldn't overload the system (i.e., we don't want 700 articles appearing for checking), while still providing a benefit. Theopolisme (talk) 07:08, 25 July 2013 (UTC)
I don't know whats getting unrealistic so stop me if it is, but would it be possible for the bot to just scan all GAs, then perhaps list the 10% of "worst offenders" for re-assessment? Looking at some of the GA re-assessment drives, it seems to me that atleast 10% of articles get deslisted after being check, though I may be wrong on that. King•Retrolord 07:36, 25 July 2013 (UTC)
Since there are over 18,000 Good Articles, the criteria for selection should be strict. Otherwise, the bot will select too many articles for review, many of them probably don't need it. Any GA with serious issues should be selected. The criteria for selection might be at least one orange tag, or maybe at least 3 citation needed tags, etc.--FutureTrillionaire (talk) 13:31, 25 July 2013 (UTC)
Yes, exactly what FutureTrillionaire says. Sure, I could definitely do it using maintenance tags as a metric (and maybe just make a list, User:Theo's Little Bot/GAs with maintenance tags, sorted by number of tags)? Would that be sufficient (or at least a good start)? Theopolisme (talk) 16:14, 25 July 2013 (UTC)
Sounds like a good idea to me. We can test this out. I'm willing to volunteer examining the articles the bot selects.--FutureTrillionaire (talk) 22:24, 25 July 2013 (UTC)
Alright, I'm generating that report now (might take a while). Theopolisme (talk) 03:51, 26 July 2013 (UTC)
Update: More like a day or so, given the sheer magnitude of articles to parse. Theopolisme (talk) 12:17, 26 July 2013 (UTC)
Cool. Do you know how often will the bot be able to update the list (if some of the GAs listed were to be delisted or if new orange tags are added to GAs)?--FutureTrillionaire (talk) 14:03, 26 July 2013 (UTC)
Does a weekly update sound good? Theopolisme (talk) 20:33, 26 July 2013 (UTC)
Sure. I am thinking about transcluding that list to a new section at WP:GAR, so that users can take a look at the selected articles.--FutureTrillionaire (talk) 21:11, 26 July 2013 (UTC)

Looks like the bot is done. However, there are issues. I checked the about 20 of the articles listed, and it appears that the reason the vast majority of these articles were selected is due to having at least one dead link tag in the article. However, this is not very useful because dead links do not violate any of the GA criteria. I saw only one article that contained an orange tag, and few articles only containing citation needed tags or disambiguation needed tags. Is it possible for the bot to ignore dead link tags and other less serious tags? I was hoping to just see articles with orange tags displayed at the top, or something like that.--FutureTrillionaire (talk) 01:50, 27 July 2013 (UTC)

It's very difficult (if not impossible) for the bot to determine the seriousness of a certain tag. However, we could create a blacklist for tags that should be ignored. We could also just display templates that transclude {{Ambox}} (that's "the orange" you were talking about). Thoughts? Thanks for bearing with me on this. (Another note: for some reason, the bot listed articles from least->most tags...fixed.) Theopolisme (talk) 02:33, 27 July 2013 (UTC)
I think we should try out the Ambox option. However, some non-serious tags that use the Ambox template should be blacklisted. Examples that I can think of are {{Current}} and {{Split}}.--FutureTrillionaire (talk) 02:59, 27 July 2013 (UTC)
{{Current}} and {{Split}} wouldn't be included, since they aren't also in Category:Cleanup templates. Here's a page to enter templates for the whitelist, though, should you stumble upon anything. Theopolisme (talk) 03:39, 27 July 2013 (UTC)
Can you run the bot again, limiting the search to only Ambox and cleanup templates? --FutureTrillionaire (talk) 13:09, 28 July 2013 (UTC)
To clarify: you mean "only ambox-based orange cleanup templates", correct? Theopolisme (talk) 16:29, 28 July 2013 (UTC)
Yes, you're right. I think this should reduce the list significantly.--FutureTrillionaire (talk) 16:54, 28 July 2013 (UTC)
Running now... Theopolisme (talk) 17:53, 28 July 2013 (UTC)