Wikipedia talk:Bots/Archive 8
This is an archive of past discussions about Wikipedia:Bots. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 5 | Archive 6 | Archive 7 | Archive 8 | Archive 9 | Archive 10 | → | Archive 15 |
Footnote bot
This bot will be used in the assistance and correction of pages using footnotes. No proposed name has yet been suggested for the username and the code for this is still unwritten.
I am proposing that the bot performs like this: (all pages refer to subpages of the bot's user space or a Wikipedia Project page)
- Every hour, check for articles listed by users at a /To Do/2024 November 21 subpage... either at its user page or a Wikipedia project page
- Fix template usage of the footnotes on that page and re-arrange them in order on the given page
- Remove added article on the /To Do/ page to /Completed/2024 November 21 subpage.
The initial suggestion was to actually browse all pages using the template. I think that's a bad idea, as we're at half a million articles and the amount of pages the bot needs to work at is really limited. Personally, I like the idea of having a dog like bot where you tell it, "Fix bot, fix." is a better implementation. That way, 1) it doesn't need to bog the Wikipedia down searching for articles it needs to fix, 2) articles can be fixed when they need to. Users would just simply leave the footnotes out of order and the bot would come around to correct the ordering. -- AllyUnion (talk) 06:53, 15 Mar 2005 (UTC)
- There have been several different footnote proposals and corresponding templates. Would this theoretical bot be working with Wikipedia:Footnote3? Certainly crawling all articles that use footnotes would put undue burden on the site. But much of the point of the bot would be to catch footnotes which accidentally get put out of order. A good compromise would be to load the "todo" page with the results of a Wikipedia:Database download analysis. It would also be nice to convert all pages to use the same footnoting system, but that might require broader community consensus. (Unless there are only a few non-standard pages, in which case they can be re-done by hand.) -- Beland 01:08, 15 September 2005 (UTC)
- I don't know about the theoretical bot, but you can look at the SEWilcoBot contributions and see what it does to articles when I use my References bot. It's under development and I run it as a Footnote3 helper on specific articles. There is a style discussion at Wikipedia_talk:Footnote3#Footnotes_vs._inline_web_references. (SEWilco 02:40, 15 September 2005 (UTC))
Underscore replacement bot
This could optionally be added to Grammar bot as an extension... anyway...
Basic idea: Change text in {{text}} and [[text]] to remove all underscores. The only exception that it will not change is anything within <nowiki></nowiki> tags. Are there any other considerations that this bot would need to make? -- AllyUnion (talk) 08:56, 15 Mar 2005 (UTC)
- I can program that. I will run the basic query I will run off, but it will have many many false positives. (No worries, it won't make edits that are false positives.) I have the 09-03-2005 dump. r3m0t talk 17:20, Mar 15, 2005 (UTC)
- There will be a few articles that have "_" as part of their proper name; the only one I can think of off-hand is _NSAKEY. Links to such articles shouldn't have underscores removed. — Matt Crypto 18:06, 15 Mar 2005 (UTC)
- I found the following articles with _ (space) at the beginning: [[_]] (yes, that's right, it does exist) _Hugh_Sykes_Davies _Swimming_at_the_2004_Summer_Olympics_-_Men's_400_metre_Freestyle and about 40 not in the main namespace. I found stuff with underscores inside the {{wrongtitle}} tag: FILE_ID.DIZ linear_b mod_parrot mod_perl mod_python _NSAKEY Shift_JIS Shift_JIS art strongbad_email.exe (here showing the ideal names, not the actual ones). Any which do not have a wrongtitle tag deserve the changes they get. Hmmph. r3m0t talk 20:55, Mar 15, 2005 (UTC)
- Is this bot still planned? There's a small discussion of the underscore habit at the VP. — Matt Crypto 16:10, 28 Mar 2005 (UTC)
This sounds like a solution in search of a problem, to me. I'd be obliged if it could avoid any templates or tables in its processing, because I use the underscore separator in order to keep multi-word links on the same line, rather than having a wrap at the internal space. I changed the section name as well, to reflect that it's not a correction, but a replacement. Noisy | Talk 16:43, Mar 28, 2005 (UTC)
- There is certainly a problem. All right, it's no more than an aesthetic annoyance, but I've seen no end of links like Main_Page in the main text for absolutely no reason.
- The VP discussion also says something about using non-breaking spaces in links, but that doesn't seem to work. Nickptar 14:57, 30 Mar 2005 (UTC)
pending deletions
Can someone write a bot to move all articles in category:pending deletions out of the main namespace, e.g. to talk:foo/pending or Wikipedia:pending deletions/foo and then delete the resultant redirect? Is that a good idea? 131.251.0.7 12:30, 16 Mar 2005 (UTC)
- It's possible. I think it's a good idea. Maybe I'll program that, and it could be made by the weekend (I think). You could ask at Wikipedia:Bot requests for somebody to program it. r3m0t talk 13:57, Mar 16, 2005 (UTC) PS you ought to register.
Numbers and commas - possible additional use for Grammar bot?
Just an idea... maybe convert all those large numbers like 100000000 to something with commas like 100,000,000. Some false positives to consider are article links, years, stuff in mathematical articles, and stuff in formatting. -- AllyUnion (talk) 08:18, 19 Mar 2005 (UTC)
- Years are easy: ignore everything under 10000. I will consider it. r3m0t talk 10:10, Mar 19, 2005 (UTC)
- Please consider that the comma use is not universal, in large parts of Europe, it is common to interchange the decimal point and the comma. We have a decimal comma and write 100.000.000. It would rather see a form like 10^8. Siebren 16:48, 15 October 2005 (UTC)
- Yeah but this is the English language Wikipedi, and as far as I know most/all english language countries use a comma. Martin 20:15, 28 October 2005 (UTC)
- Not all. South Africa uses the comma as a decimal seperator, and a space as the seperator. As far as I know, this is SI standard, since in mathematics, a dot means multiply (in grade 7 my maths teacher told us that this is why SA uses decimal comma) --Taejo | Talk 12:19, 9 November 2005 (UTC)
- Also note that even in English, scientific writing tends not to use commas. I think the SI standard seperator is a thin-space, when they are used at all. --Bob Mellish 21:41, 28 October 2005 (UTC)
- There was discussion of this on Wikipedia talk:Manual of Style (dates and numbers)#number notation at some length recently. Some people proposed making the SI style the standard for all wikipedia, and banishing all use of commas as number seperators. others objected. It might be a good idea to read this before starting any bot of this type. DES (talk) 21:52, 28 October 2005 (UTC)
You can work out how to seperate it, but I think this is a great idea and should be implemented in one form or another. HereToHelp (talk) 02:20, 2 November 2005 (UTC)
Bot to update Dutch municipalities info
I'd like permission to use a bot to update the pages on Dutch municipalities. Things the bot wants to do: update population etc. to 2005; add coordinates to infobox; add articles to the proper category. After that, I may use it as well for adding infoboxes to the articles on Belgian municipalities, and perhaps those of other countries. Eugene van der Pijll 21:45, 27 Mar 2005 (UTC)
Psychology and mental health content only bot
I'd like permission to maintain a small collection of read-only psychology and mental health-only related material through a bot. This bot would prevent search engine indexing, and limit accesses via its interface. Given the tiny percentage of material I'm interested in, downloading and parsing the enormous database dumps is not an option given my limited server resources. Thank you. -- Docjohn (talk) 12:31, 31 Mar 2005 (UTC)
RCBot
I'm requesting permission for a bot still being in development: User:RCBot. Its purpose is to help with issues on the Wikipedia Commons. There is only one task that the bot is supposed to do at the moment: help renaming media files on the commons. Suppose there were two identical files (e.g. images) on the Commons that might even have been uploaded from different language Wikipedias. But media files on the Commons can be used in any language Wikipedia and so all languages need to be checked and – if necessary – the reference to the first file replaced by one to the other. This is what the bot is supposed to do and it therefore also needs permission in the English wikipedia. — Richie 18:24, 4 Apr 2005 (UTC)
McBot
I have received a copy of the software Kevin Rector uses for KevinBot. It is used to transwiki pages marked with {{move to Wiktionary}} to Wiktionary. I just registered the username McBot for it and plan to use that for any botting. Requesting permission... --Dmcdevit 05:51, 9 Apr 2005 (UTC)
- I'd like to verify that I did give him a copy of the bot software for transwikification so that more than one person could patrol this. I've also trained him in how to use it and would support his bot account being flagged. Kevin Rector 14:50, Apr 9, 2005 (UTC)
JdforresterBot ("James F. Bot")
Heya.
I've created this account to do some boring and exhausting little jobs, like correcting the 700 inbound links each to a set of 40 or so pages to be moved.
May I please have bot status?
James F. (talk) 22:50, 12 Apr 2005 (UTC)
Bot status request
I would like to request bot status for User:Diderobot. It is semi-automated and I plan to use it to fix grammar, spelling, punctuation, as well as wiki syntax and double redirects. Sam Hocevar 10:17, 13 Apr 2005 (UTC)
- Can you be a little more specific? -- AllyUnion (talk) 10:33, 23 Apr 2005 (UTC)
- By semi-automatic you mean it's manually assisted? And how does it fix grammar, spelling punctuation? Based on what type of dictionary? -- AllyUnion (talk) 06:51, 24 Apr 2005 (UTC)
- By semi-automatic I mean all changes are validated by hand. It first runs offline on a database dump and displays a list of the modifications it is going to apply. I accept or refuse each of them manually. Then the bot runs online, downloading, modifying and saving articles according to the validated changeset.
- As for the dictionary, it uses a wordlist and a set of regexp generators matching common mistakes. For a very simple exemple, the function
gen_check_with_suffix('..*', 'iev', 'eiv', 'e|ed|er|ers|es|ing|ings')
will generate the following regexp:\b([nN]|[sS]|[aA](?:ch|ggr)|[bB](?:el)|[dD](?:isbel)|[gG](?:enev|r)|[fF](?:latus-rel)|[hH](?:andkerch)|[kK](?:erch)|[mM](?:ake-bel|isbel)|[oO](?:verach)|[nN](?:eckerch|onach|onbel)|[rR](?:el|epr|etr)|[uU](?:nbel|nderach|nrel)|[tT](?:h))eiv(e|ed|er|ers|es|ing|ings)\b
which matches spelling errors such as "theives", "acheive", "disbeleiving", etc. Sam Hocevar 08:15, 24 Apr 2005 (UTC)- Dear lords, that is a scary regex... Clever though, and a good idea. You get a cookie for it. :p --Veratien 18:33, 7 August 2005 (UTC)
- By semi-automatic you mean it's manually assisted? And how does it fix grammar, spelling punctuation? Based on what type of dictionary? -- AllyUnion (talk) 06:51, 24 Apr 2005 (UTC)
Wikipedia:Introduction, Sandbot header enforcement
Adding Wikipedia:Introduction header enforcement for the top two lines. -- AllyUnion (talk) 09:59, 14 Apr 2005 (UTC)
LupinBot
This bot has been uploading maps for a little while now, blissfully unaware of the existence of this page. It's a little wrapper around upload.py
Apparently there's a bot flag I need, so I'm listing it here. Lupin 00:29, 20 Apr 2005 (UTC)
Request bot permission for mathbot
I use a bot called mathbot, to primarily do houskeeping in the math articles. So far, it was concerned with removing extra empty lines, and switching to some templates. More uses might show up. I make sure I never use it for more than 60 edits in one day (there is no rush :) Could I register it as a bot? Thanks. Oleg Alexandrov 17:59, 20 Apr 2005 (UTC)
- Specific about what? Oleg Alexandrov 14:06, 23 Apr 2005 (UTC)
- I wrote above I used it for removing extra empty lines and switching to some templates. I also did lots of semi-automated spelling, but preferred to use my own account for that, as this was trickier and I wanted the resulting pages on my watchlist.
- So, I used my bot for nothing else than what I wrote above, that's why I can't be more specific. About the future, I don't know what will show up. Oleg Alexandrov 15:04, 24 Apr 2005 (UTC)
- To do more general work, I would need, in most cases, to have a local copy of Wikipedia to do queries. I can't afford to download the whole Wikipedia and the mysql database. If, at some point, jobs show up for which I do not need to have the local download, I can take care of them. Oleg Alexandrov 14:06, 23 Apr 2005 (UTC)
Tagging
I just started developing a bot which, at the moment, bears the name Tagbot. It would help with the tedious work that is Image Tagging. It's task would be simple, it would find untagged images and tag them with {{No source}}. Then it would find the user who first uploaded it and leave a message on his/her talk page saying something like "Hi! Image XXX that you uploaded is untagged. This is bad because..... You tag an image by putting......etc.". This purpose of this would be:
- Many images would be automatically tagged by the user
- It would tag all images with {{unverified}}, and thats better than nothing.
- It would make actual image tagging much easier to organize, as you simply would have to look at Category:Images with unknown source.
I was thinking it might be best to hear comments and/or get permission before I got too deep into the process. So what y'all think? By the way, it might also be a good idea to change Mediawiki to automatically add an unverified-tag to a new image, to ensure no untagged images. So, what y'all think? Gkhan 01:23, Apr 30, 2005 (UTC)
- One problem with this: a number of images have copyright information included, but it's in a pre-tagging form. Would it be possible to make this bot into a semi-automated process? What I'm envisioning is
- The bot finds an untagged image
- It shows the image, on its Wikipedia page, to the user monitoring the process
- The user either indicates the tag that should be added to the page, or indicates that the bot should tag it {{unverified}} and post on the uploader's user page.
- If the whole system is made simple enough, copies of the bot software could be distributed to everyone who's participating in the image tagging process. How does this sound? --Carnildo 01:44, 30 Apr 2005 (UTC)
- It does sound good, I like it alot. One thing though, the way I was imagining the bot was that I should download the database and look up the images to be tagged, because thats the only way I can think of that would work short of making database requests. I suppose I can do two versions, one that tags all images with something like {{to-be-tagged}} an then the tagging-"client" that is distrobuted could read off that list (the Category-list that is). I am not qualified to asses the server hog this would create, but it sounds good (great!) to me. Gkhan 01:57, Apr 30, 2005 (UTC)
- My understanding is that, with the current server software, categories with large numbers of pages should be avoided if at all possible. Viewing such a category puts almost as much load on the server as viewing each individual article in the category. --Carnildo 02:10, 30 Apr 2005 (UTC)
- That can't possibly be true! When you add or remove a category from an article doesn't some sort of list become edited that the Category-page reads from (atleast thats what I, an amateur, incompetent and inexperienced programmer would do (I'm not really that bad, only as a comparison to mediawiki-developers)). Viewing a category-page should only be as hard as reading from that list, right? Although when you look at an image category, it displays all images. That most certainly is a server hog (it would be nice with some developer-input though). Gkhan 02:41, Apr 30, 2005 (UTC)
- My understanding is that, with the current server software, categories with large numbers of pages should be avoided if at all possible. Viewing such a category puts almost as much load on the server as viewing each individual article in the category. --Carnildo 02:10, 30 Apr 2005 (UTC)
- It does sound good, I like it alot. One thing though, the way I was imagining the bot was that I should download the database and look up the images to be tagged, because thats the only way I can think of that would work short of making database requests. I suppose I can do two versions, one that tags all images with something like {{to-be-tagged}} an then the tagging-"client" that is distrobuted could read off that list (the Category-list that is). I am not qualified to asses the server hog this would create, but it sounds good (great!) to me. Gkhan 01:57, Apr 30, 2005 (UTC)
WouterBot
I would like to use pywikipedia's solve_disambiguation.py to facilitate my disambiguation work. I created an account User:WouterBot for this and now I would like your permission to use it. WouterVH 14:27, 4 May 2005 (UTC)
- Feel free to do so. Running that script requires human intervention to complete a disambiguation. It's what I originally setup my bot account to do. RedWolf 04:47, May 17, 2005 (UTC)
Pending deletion script
Shouldn't User:Pending deletion script be listed on Wikipedia:Bots? Gdr 20:03, 2005 May 12 (UTC)
- No, because it's a dev's bot, and should be watched anyway. -- AllyUnion (talk) 00:52, 14 May 2005 (UTC)
- That and it seems to be done. -- AllyUnion (talk) 00:53, 14 May 2005 (UTC)
tickerbot
I've finished a bot that can upload quotes as well as other financial data (such as market cap, dividends, etc) on publically traded companies daily, from the machine readable interface provided by Yahoo Finance. It downloads a list of stock symbols and the format template from wiki pages, then sticks them data for each symbol in its own page in the Template namespace (something like Template:Stock:AAPL, for example).* Is this something that people are interested in? I guess it boils down to a somewhat philosophical question as to whether this kind of timely information should be considered "encyclopedic".
- Does this violate Yahoo Finance's Terms of Service or any other legal considerations? Most stock ticker apps that I know of use Yahoo Finance's interface for data, but it may be different for something of this scale.
Questions:
- Exactly how many symbols are we talking about? I think it is a good idea, but it seems a bit excessive. I find that "linking" the stock symbol would be better. -- AllyUnion (talk) 08:07, 25 May 2005 (UTC)
- It could work for as many symbols as desired. It would download the list from a wiki page. Taak 18:40, 31 May 2005 (UTC)
- That's a valid concern, but how many edits/day would it take to be noticeable? If it was updating daily it could certainly run during only off-peak hours. If we could get some numbers here we could figure it out. Taak 02:22, 7 Jun 2005 (UTC)
- The question would be more of how many pages and how many symbols are we talking about, and which markets are you referring to. Furthermore, I'd very much prefer to keep the bot's edits out of the template space. Template:Stock:AAPL seems a bit excessive and rather unusual. Let me see here... the NYSE starts at around 9 AM ET and closes at 5 PM ET. So, you'd have about 16 hours to update NYSE stocks. Assuming if you updated a page every 30 seconds, which is the typical recommended editing speed, you'd only be able to update 1920 stocks out of the 2800 stocks traded on NYSE. That only gives you 68.6% of all the stocks on NYSE. Assuming if we allow you to "push" the limit, having an edit period of 10 second apart... 28000 seconds which is 466 minutes and 40 seconds which is 7 hours, 46 seconds, and 40 seconds. So, you'd have to have your bot edit at least 15 second part each day to update stock information on all NYSE traded stocks. Do you think it still is a bit excessive? -- AllyUnion (talk) 07:11, 10 Jun 2005 (UTC)
- I've been thinking of something similar for currencies. How about this: only do full updates once a week (but stagger it through the week) except for particularly volatile stocks (say, stocks that have gone up/down by more than a certain amount since the previous day). But anyway: surely not every NYSE company is notable enough to have a wikipedia article? (I checked - 192 articles in Category:Companies traded on the New York Stock Exchange) --Taejo | Talk 12:48, 9 November 2005 (UTC)
Requesting permissions for running bots (clarification)
I feel that the policy needs a bit of clarification. I believe that bots should be permitted to run before they receive their bot flag, only once their purpose has been described here clearly and a sysop has reviewed what the bot does. In order to prove the burden of proof, we must see a week's worth of edits on the bot when it is running. Otherwise, we have no way to determine such burden of proof without a sample. The bot flag is given to bots once they have been proven their burden of proof, not for permission to run the bot. Furthermore, in its comment, it should declare what it is doing... as all good bots should. -- AllyUnion (talk) 06:52, 27 May 2005 (UTC)
- I think your proposal is a little draconian. I do think it's fair to ask bot authors to announce their bot here, to describe what their bot is doing, to run under a bot account, and to limit the bot's rate of activity. And it's fair to have a policy of block-on-sight for rogue bots. But I don't think it's fair to require administrator approval to run a bot at all. What if no sysop is available or interested enough to comment? And what gives administrators the insight and knowledge to judge whether a bot deserves to be given a chance to prove itself? Administrators are supposed to be janitorial assistants, not technical experts. Gdr 22:21, 2005 May 29 (UTC)
- Ditto; There's more than enough red tape for bots already, please let's not make it any worse than it already is by applying pseudo-legal concepts to wiki bots. Perhaps ask people to go slowly in the first week the bot is run, thus allowing time for feedback before a full rollout; but let's not discourage contributions (or encourage illicit bots) by making the bar so excessively high that people either say "stuff that", or start operating without first requesting approval (which are precisely the outcomes that excessive regulation will encourage). Furthermore, in its comment, it should declare what it is doing - this I do agree with, but as a guideline or recommendation. -- Nickj (t) 02:50, 30 May 2005 (UTC)
- Well, maybe not necessarily an administrator. I think someone should review what a bot does... because I think the whole "if no one says anything" policy should be slightly reconsidered. What I'm trying to say is that bots should be always given a trial period of one week, then stopped and reviewed over by someone. I suggested sysops, because we don't need anyone technical to review if a bot's edits are harmful. Usually a smart person can determine whether a bot's edits are harmful. I suggested sysops because they have the power to block bots anyway... If a bot's edits are harmful, then they should be blocked, however it should be here where a bot's owner can attempt to discuss in order to unblock their bot for another trial run. -- AllyUnion (talk) 05:20, 31 May 2005 (UTC)
Policy: spelling
There should be no (unattended) spelling fixing bots; it is quite simply not technically possible to create such a bot that will not make incorrect changes; if the big office utility companies can't make a perfect automated spellchecker, you most likely can't either
It seems like this is too harsh. The comparison with word processing spellcheckers isn't a fair one, because the needs are different -- we don't need a bot that would be a "perfect" spellchecker, just one that would (1) make many useful corrections, but (2) when in doubt leave articles alone. It seems like it wouldn't be impossible to make a spelling bot that would have a "certainty metric" for how sure it was that the correction was a needed one, and only make it if it was extremely sure.
I know this page is for requesting bots, but there ought to be a place to discuss the content of this page as well, and if not here I don't know where. Zach (wv) (t) 16:00, 31 May 2005 (UTC)
- Well, why not just have User:Humanbot, released today? ;) r3m0t talk 16:21, May 31, 2005 (UTC)
- I agree with these comments. We have spam filters that have exceedingly low false-positive levels (on the order of maybe 1 in 10,000 mails). Why shouldn't it be possible to build a similarly conservative spell checker? Besides this, I can tell you that "the big office utility companies" are almost certainly working on context-sensitive spellchecking technology with much-reduced false-positive and false-negative rates and larger dictionaries. There have been recent papers on context-sensitive spellchecking which found about 97% of errors and produced almost no false positives at all; it could even be trained on the existing body of Wikipedia text, so that it recognizes many specialised terms. I was thinking of creating a tool that does this type of spellchecking. This rule suffers from a lack of imagination.
- On the other hand, I really don't see any need for fully automatic spellchecking. Page update speeds are slow enough that human review shouldn't cost much relative time, provided that it's not wrong too often. It can simply list its corrections and ask them to hit ENTER if they're all good. Deco 17:56, 31 May 2005 (UTC)
A small note here... part of the reason is that we have this difference between various spellings in English. While some people might write Internationalization, others might write it as Internationalisation. Both are still correct, but one is used over the other depending on the country you are from. Furthermore, a spellbot must recognize and skip over Unicode characters as well as HTML codes, wiki codes, in addition to any languages that use partial English for pronounation or actual words. Such examples is Romanji, Spanish, and so on. An automatic bot correcting this may accidently change something in an article not necessarily intended... and a simple revert back defeats the purpose of the bot especially if it made only one mistake out of the entire page that it just corrected. (Assuming that the one mistake is not easily corrected by a human editor.) Just some of my thoughts on the matter. -- AllyUnion (talk) 09:13, 2 Jun 2005 (UTC)
- Oh yes, don't forget proper nouns such as names of places, people, things... We don't know how accurate it will be on the Wikipedia. It might be 97%, it might be lower. -- AllyUnion (talk) 09:15, 2 Jun 2005 (UTC)
- I think you are right to point out these difficulties, but I think that they are surmountable, and that some one could conceivably make a useful spell checker (defined as finding some small, but nontrivial amount of wrong spellings, and making false-positives, i.e. "corrections" that shouldn't be made, almost never). You would just have to make it extremely conservative, as Deco was saying. It wouldn't get all, or maybe not even most, mispellings, but as long as it got some of them, and didn't make any false-positives, it would be useful. You could program it to not correct anything with a capital letter, for example, which would cover the proper name problem. You could also program it to recognize all the various English spellings pretty easily. It's probably not a huge issue, since I almost never find misspellings, but I think it could be done, and I'd hate to see someone be discouraged from trying if the policy is worded too strongly. Zach (wv) (t) 23:28, 4 Jun 2005 (UTC)
- What about sciencific names? There is a certain degree of intelligence it does require... A person can easily determine between a typo and a misspelling... I'm not completely certain a computer can. I would not mind if we had a bot that "prompted" the spelling errors or highlight them in some manner. Nor would I mind that it is a bot that corrects the most common mistakes. Correcting the most common mistakes of misspellings would yield better results, I believe. I'm just really concern that the bot runs into a strange word and it doesn't know it... but finds a match in its dictionary and changes the word. -- AllyUnion (talk) 06:03, 5 Jun 2005 (UTC)
- I think you are right to point out these difficulties, but I think that they are surmountable, and that some one could conceivably make a useful spell checker (defined as finding some small, but nontrivial amount of wrong spellings, and making false-positives, i.e. "corrections" that shouldn't be made, almost never). You would just have to make it extremely conservative, as Deco was saying. It wouldn't get all, or maybe not even most, mispellings, but as long as it got some of them, and didn't make any false-positives, it would be useful. You could program it to not correct anything with a capital letter, for example, which would cover the proper name problem. You could also program it to recognize all the various English spellings pretty easily. It's probably not a huge issue, since I almost never find misspellings, but I think it could be done, and I'd hate to see someone be discouraged from trying if the policy is worded too strongly. Zach (wv) (t) 23:28, 4 Jun 2005 (UTC)
- I would not mind if we had a bot that "prompted" the spelling errors or highlight them in some manner. Wake up and smell the roses! r3m0t talk 19:28, Jun 10, 2005 (UTC)
This really isn't possible because there are a number of circumstances in which we'd want to deliberately misspell a word, for example in an exact quotation with [sic] or mentioning "this name is commonly misspelled as foo". We even have articles about spelling usage like Misspelling or Teh. What's considered a misspelling in modern English might have been used in the past before spelling was standardized, so it could appear in quotations from old sources or in an explanation of the etymology of a word. Articles often give "pronounced like" spellings of words that may be the same as common misspellings. What is a misspelling in English might be the correct spelling in a foreign language we're quoting. I'm sure there's other situations I haven't thought of. There's just no way you can rule all of this out, no matter how conservatively you construct your word list. DopefishJustin (・∀・) June 30, 2005 21:36 (UTC)
- Quite, there are also rare words like specialty that commercial programs routinely translate into speciality or compAir, which is a company name but would be auto-corrected to compare. (doodlelogic)
Procedure for user supervised scripts
I am presently designing RABot to simplify some of the tasks associated with maintaining the requested articles pages, namely deletion of created articles and sorting / tidying up article lists.
Because of the variations in formatting across different request pages and the complexities of ways in which people make requests, this is a fairly complicated parsing task. The scripts I've created so far are fairly accurate and do a good job of handling many things, but it is unlikely that this will ever be able to run unsupervised since there will likely always be people adding bizarrely formatted requested that the scripts choke on. (It is supposed to ignore things it can't parse, but sometimes it thinks it understands things it really doesn't.) So as a result, I plan to manually check and approve all its proposed edits before committing them.
So my question is, what is the procedure for getting approval for running scripts like this? It is not really a bot in the sense that it will not be running independantly, but it does use the Python bot library and have a seperate user account. The bots page isn't particularly clear on what is expected before running supervised scripts like this.
Once it reaches a point where I feel it is well-behaved enough that I want to start using it in maintaining the requested articles pages, I intend to write a detailed description of what it is trying to do (e.g. remove created articles and sort lists) and discuss its presense on the RA talk pages. Is that enough?
Dragons flight 00:46, Jun 10, 2005 (UTC)
- I personally don't think that this particular bot is a good idea. If a certain link in Wikipedia:requested articles becomes blue, that is the article is created, I think it is good for a human editor to inspect that article, see if it indeed corresponds to the title, clean it up if necessary, and only later remove it. Also, having a blue link or two in this list is not something as serios as to spend a lot of time creating and debugging a bot. Wonder what others think. Oleg Alexandrov 03:34, 10 Jun 2005 (UTC)
- If you haven't recently, you might want to take a look over Category:Wikipedia requested articles, there are a lot of links even if you aren't paying attention to somewhat ugly dump pages like Wikipedia:Requested articles/list of missing pharmacology. Some users, like User:DMG413, climb high in the edit count primarily by removing blue links from those lists. In a perfect world, you would be right and people would take a careful look at each article before removing it, but a single request page might get 5-10 blue links a day and go for week without being cleaned. At which point no reasonable person is actually going to look through the articles. It is also worth noting that I am (at least for now) ignoring the much larger Category:Wikipedia missing topics. Dragons flight 04:03, Jun 10, 2005 (UTC)
- P.S. If you have other suggestions on how better to deal with the growth of and turnover in RA, then that is also a worthwhile conversation, but I find it difficult to accept an objection to simplifying a process that is already consuming a significant amount of users time. Dragons flight 04:06, Jun 10, 2005 (UTC)
- Technically, your bot would qualify under being manual controlled "bot assistant." More or less a specialized editor, if you will. Regardless, asking permission from the community helps to confirm what you are doing. I'd recommend that you perform all edits under what "seems" to be a bot account, and give it a test run. Since your bot has an objection from Oleg Alexandrov, please run your bot slow (maybe edits that are at least 1 minute apart), and for about a week. When the week is up, ask Oleg Alexandrov if he still objects to the use of your manually controlled bot. Stop immediately, if anyone makes a complaint that your bot is not working properly or doing something not intended. In any case, I'd like to give you a trial week, and even if you screw up, it still be easy to revert your edits anyway. -- AllyUnion (talk) 06:49, 10 Jun 2005 (UTC)
- OK, OK, I don't want to be blocking people willing to do good work. So please feel free to create your bot and use it as you see fit. However, it is sad that the bot will unemploy User:DMG413 who is increasing his/her edit count by removing blue links from said list. :) Oleg Alexandrov 14:46, 10 Jun 2005 (UTC)
- Thanks for the clarification. Dragons flight 03:09, Jun 11, 2005 (UTC)