Wikipedia:Bots/Requests for approval/UnitBot (reopen)
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Request Expired.
Operator: Hyperdeath (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 19:17, Monday March 19, 2012 (UTC)
Automatic, Supervised, or Manual: Supervised for present. Automated in longer term.
Programming language(s): PHP
Source code available:
Function overview: Automatically fix articles where unit conversions are made to an unjustified degree of precision. For example, if an editor converts a length of 1000ft to 304.8m, this implies that the value is highly precise and must lie between 304.75 and 304.85m, which in most cases is plain false.
Edit period(s): One time run. (Future upgrade to continuous run planned, but not being requested as yet.)
Estimated number of pages affected: ~10,000
Exclusion compliant: Yes
Function details: The bot will systematically trawl through the Wiki, searching for unit conversions made to a ridiculous degree of precision (e.g. 1000ft = 304.8m). On finding such overzealous conversions, the conversion will be replaced with one based on Template:Convert, in which a sensible degree of rounding is employed. If Template:Convert is unable to handle the conversion, the conversion will be fixed by directly modifying the text. The context of the conversion will be surveyed, in order to minimize potential for false corrections.
The bot will initially be supervised, and will list all the changes to be made to a particular page, before asking for permission to make those changes.
Discussion
[edit]- UnitBot was previously approved for testing, but this approval expired due to delays. UnitBot is now ready for supervised testing, and I ask for this approval to be reissued. — Hyperdeath(Talk) 19:17, 19 March 2012 (UTC)[reply]
Technical {{BotTrial}}
to the seperate list (not on pages themselves); you can do more if they are easily reviewable (i.e. BAG doesn't need to click 100 diffs and compare a lot of stuff). I do think this needs more input, especially what constitutes "ridiculous degree of precision". 1 more significant digits is hardly ridiculous, if slightly overzealous and most likely just copy-pasted of a converter. I would, for example, be fine with automatic conversions of 3+ significant digits past what the original number offers. You previously mention "Quit if the quantity appears to be a precisely defined standard.", "Quit if the article is about overprecise units." -- how would automatic bot detect these? Also "Amend the score based on context." and accompanying heuristics may need very careful design, as previously bots have been expected to be near-false-positive-free. You should also see LightBot BRFAs and what corner cases and syntaxes it handled in this matter (i.e. adding {{convert}}) -- a generally picky subject right now. — HELLKNOWZ ▎TALK 21:51, 21 March 2012 (UTC)[reply]
- Thanks, although I have a couple of questions:
- * What do you mean by the "separate list"? Would it be permissible to directly edit articles, if a manual operator peruses all the proposed edits first?
- * What do you mean by the convert template being a "picky subject right now"? Have people been objecting to its use? Would it be better to just change the article text?
- The heuristics would be mostly based on categories and surrounding words. Categories such as "Units of x", or words such as "defined", "exactly" and "standard" would tell the bot to back off. I agree that a lot of testing will be required before it is allowed to work unsupervised. As for 3+ additional significant figures, that seems reasonable. Perhaps it could drop down to 2+ significant figures, if the unit is described as approximate.
- — Hyperdeath(Talk) 12:31, 22 March 2012 (UTC)[reply]
- I just wanted more input before full trial, but didn't want to hold you from running tests that others could review. — HELLKNOWZ ▎TALK 13:23, 22 March 2012 (UTC)[reply]
- Where should the edits go? — Hyperdeath(Talk) 15:21, 22 March 2012 (UTC)[reply]
- Sorry, what I meant is that I'm fine if you make a report page in bot's userspace or somewhere for now so we can see what the edits might look like. But, as I said, if you look at the detail level of Lightbot's BRFAs, and the convert-related debate recently, I'd (personally) would want to see more input from BAG and definitely broader advertisement to the relevant noticeboards before a full trial. Additionally, you should expand on the function details, such as, what syntax constitutes what, what units it recognises, what are the thresholds, how are corner cases recognised, etc. etc.. You don't have any relevant discussions linked. — HELLKNOWZ ▎TALK 15:51, 22 March 2012 (UTC)[reply]
- I can't find this convert related debate. Please can I have a link. Also, I've expanded slightly on function details, on the Unitbot user page. — Hyperdeath(Talk) 13:44, 23 March 2012 (UTC)[reply]
- It's not a single one, it's a long list of scattered ones across different issues and editors. I'm just saying that it would be wise to exercise extra caution with this task or may be I'm just being too paranoid. — HELLKNOWZ ▎TALK 14:18, 23 March 2012 (UTC)[reply]
Approved for trial (30 edits) Please provide a link to the relevant contributions and/or diffs when the trial is complete. on articles. — HELLKNOWZ ▎TALK 14:18, 23 March 2012 (UTC)[reply]
- That seems to contradict your previous statements. Does that mean UnitBot is allowed to directly edit articles, so long as each edit is first approved by a human operator? — Hyperdeath(Talk) 14:42, 23 March 2012 (UTC)[reply]
- Yes, consider this a normal/regular trial. — HELLKNOWZ ▎TALK 14:42, 23 March 2012 (UTC)[reply]
- I note that you have recently commenced the trial. Josh Parris 14:48, 30 March 2012 (UTC)[reply]
- Yes. At the moment I am making sporadic (but always beneficial to the Wiki) test edits, to hone the algorithm. — Hyperdeath(Talk) 16:23, 1 April 2012 (UTC)[reply]
- You've been inactive for several days, but I note that you're still working on the bot task. Development still progresses? Josh Parris 05:41, 7 April 2012 (UTC)[reply]
- Yes. I'm doing some of the boring stuff, like working on the various anti-false-positive filters. I'm also working on a way to more efficiently trawl through the wiki. One strategy I'm considering is doing systematic searches on "suspicious" numbers, like 304.8 (which can result from converting 1000 feet into metres). — Hyperdeath(Talk) 10:06, 7 April 2012 (UTC)[reply]
- Your bot has made an edit recently which shows that effort continues to be made. Josh Parris 23:39, 16 April 2012 (UTC)[reply]
- Yes. I've also made many virtual edits. I'm currently working on the heuristics for choosing the optimal rounded down number. Using the implied error bars on the original values can sometimes provide values that look a little bit odd in context, even if they are mathematically justified. — Hyperdeath(Talk) 10:55, 17 April 2012 (UTC)[reply]
- Your bot has made an edit recently which shows that effort continues to be made. Josh Parris 23:39, 16 April 2012 (UTC)[reply]
- Yes. I'm doing some of the boring stuff, like working on the various anti-false-positive filters. I'm also working on a way to more efficiently trawl through the wiki. One strategy I'm considering is doing systematic searches on "suspicious" numbers, like 304.8 (which can result from converting 1000 feet into metres). — Hyperdeath(Talk) 10:06, 7 April 2012 (UTC)[reply]
- You've been inactive for several days, but I note that you're still working on the bot task. Development still progresses? Josh Parris 05:41, 7 April 2012 (UTC)[reply]
- Yes. At the moment I am making sporadic (but always beneficial to the Wiki) test edits, to hone the algorithm. — Hyperdeath(Talk) 16:23, 1 April 2012 (UTC)[reply]
- I note that you have recently commenced the trial. Josh Parris 14:48, 30 March 2012 (UTC)[reply]
- Yes, consider this a normal/regular trial. — HELLKNOWZ ▎TALK 14:42, 23 March 2012 (UTC)[reply]
Request for increased search ability
[edit]I have decided that the optimum wiki trawling method is (as mentioned above) to perform searches on "suspicious" numbers, like 304.8 (which can result from converting 1000 feet into metres), and then to check these pages only. Unfortunately, bot searches are limited by default to 50 return values. Can this be upgraded to the standard user limit of 500? — Hyperdeath(Talk) 16:16, 29 April 2012 (UTC)[reply]
- When your bot is flagged, it'll be granted the "apihighlimits" right, which will allow it to get up to 500 results in most API queries. Hersfold (t/a/c) 22:43, 1 May 2012 (UTC)[reply]
- I don't see any change. Is there something I have to do? — Hyperdeath(Talk) 20:23, 9 May 2012 (UTC)[reply]
- Your bot will not be flagged until this request is approved. — madman 02:44, 11 May 2012 (UTC)[reply]
- I don't see any change. Is there something I have to do? — Hyperdeath(Talk) 20:23, 9 May 2012 (UTC)[reply]
- Comment This task overall seems like a variety of WP:SPELLBOT to me, therefore I think all such edits should be human approved. It also probably makes more sense to find the "suspicious" numbers from a database dump instead of using search. 66.127.55.46 (talk) 18:48, 17 May 2012 (UTC)[reply]
- The bot will make human approved edits. (At least for now; spellbot type bots are permitted if they can get false positive rates down to zero.) I'll use the database dump idea, as you suggest. — Hyperdeath(Talk) 06:47, 18 May 2012 (UTC)[reply]
Nine edits left. Going to just finish the trial, please? It does not appear to have edited since April and such sporadic bots are unlikely to be approved. Rcsprinter (shout) 15:45, 14 June 2012 (UTC)[reply]
- I've been busy with other things of late. However, I still intend to complete the bot, and will use the database-dump approach, as mentioned above.
- I interpreted the 30 searches as an upper bound, not an obligation. It has been proven the bot is capable of making useful edits, without arousing opposition from human editors, and in that sense, the trial is a success. The next stage is to efficiently search for "bad" articles, which I will do.
- — Hyperdeath(Talk) 16:11, 14 June 2012 (UTC)[reply]
- Well, a reasonably large corpus of edits is actually needed to evaluate the task in as many real-world conditions as possible before approving. Not saying I'm concerned about this particular task, but this is why we tend to wait until we have enough edits to examine. — madman 23:44, 14 June 2012 (UTC)[reply]
- OK. I've started modifying my code to accomodate a database-dump based search, and then I'll finish off the trial. — Hyperdeath(Talk) 12:20, 15 June 2012 (UTC)[reply]
- Does anyone have any PHP code snippets for efficiently getting articles out of the XML database? — Hyperdeath(Talk) 11:28, 17 June 2012 (UTC)[reply]
- I don't have any PHP code snippets, no. meta:Database dump#Tools has some good resources; I've used mw:MWDumper#Filter actions a few times to filter XML dumps and then process them in PHP with SimpleXMLElement. Hope this helps. — madman 14:46, 18 June 2012 (UTC)[reply]
- Does anyone have any PHP code snippets for efficiently getting articles out of the XML database? — Hyperdeath(Talk) 11:28, 17 June 2012 (UTC)[reply]
- OK. I've started modifying my code to accomodate a database-dump based search, and then I'll finish off the trial. — Hyperdeath(Talk) 12:20, 15 June 2012 (UTC)[reply]
- Well, a reasonably large corpus of edits is actually needed to evaluate the task in as many real-world conditions as possible before approving. Not saying I'm concerned about this particular task, but this is why we tend to wait until we have enough edits to examine. — madman 23:44, 14 June 2012 (UTC)[reply]
- Here's a code snippet using the api:
<?php
require_once 'botclasses.php';
$wiki = new wikipedia();
$x = $wiki->query('action=query&list=allpages&apnamespaces=0&aplimit=500');
foreach($x['query']['allpages'] as $q){
$pages[] = $x['title'];
}
//This is optional, I just like my arrays tidy. ;-)
sort($pages);
?>
- If you want the bot to run continuously, you could do this:
<?php
$c = 0;
foreach($pages as $pg){
//CODE:
$c++;
}
if($c == 500){
$x = $wiki->query('action=query&list=allpages&apnamespaces=0&aplimit=500');
foreach($x['query']['allpages'] as $q){
$pages[] = $q['title'];
}
}
?>
- Cheers, --Ceradon talkcontribs 23:56, 19 June 2012 (UTC)[reply]
- Thank you people for the info. — Hyperdeath(Talk) 20:59, 21 June 2012 (UTC)[reply]
- { – Any updates? — madman 14:59, 14 July 2012 (UTC)[reply]
- I've been heavily distracted with a pile-up of other things over the past few weeks. I'm still working on it though. — Hyperdeath(Talk) 12:58, 15 July 2012 (UTC)[reply]
- Request Expired. – This request may be re-opened when the requested task is ready for trial. Thanks, — madman 04:14, 6 August 2012 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.