Wikipedia:Bots/Requests for approval/H3llBot 11
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Hellknowz (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 11:03, Friday August 23, 2013 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): C#, custom API
Source code available: No
Function overview: below
Links to relevant discussions (where appropriate): --
Edit period(s): Continuous
Estimated number of pages affected: <500 per Category:Pages with archiveurl citation errors then as they come up
Exclusion compliant (Yes/No): Y
Already has a bot flag (Yes/No): Y
Function details:
Appending H3llBot 4 (User:H3llBot/U2A):
In citations, when the |archiveurl=
or |url=
are set to an archive service link, but the corresponding |url=
is not set or |archiveurl=
isn't used, set the missing fields and fill in the date if needed. H3llBot 4 already covers this for urls and dates I can parse out of the citations. However, the majority of Category:Pages with archiveurl citation errors are using shorthand archive urls, so I need to actually browse the pages and retrieve the url/data.
For example, Van Cleave has 2 errors. Citations have http://www.webcitation.org/64zXFfeH5 and http://www.webcitation.org/6B2tdaqFt links, which need browsing to get the actual values -- http://www.isuresults.com/bios/isufs00012936.htm at 2012-01-26 and http://www.isuresults.com/bios/isufs00012936.htm at 2012-09-29.
I feel this is different enough in technology (actually reliably browsing the websites, editors can't tell the url/date from markup, and I need to implement each site-specific check) that this warrants a BRFA.
I'll try and add all the major/accepted archive providers I come across, including Wayback (Internet Archive), Webcite, Archive.is, Google Cache, etc.
Here is a sandbox edit with common providers converted/filled in (webcitation is down atm, but that one can be seen in previous edits).
For the record, I have also upgraded the original task with a few other parameter misuse cases. A popular being setting |archiveurl=
, but not |url=
. The logic is exactly the same, except the archive url itself was already in the correct location. You can see this in recent contribs.
Discussion
[edit]As an outsider, this looks good to me Hasteur (talk) 14:40, 20 September 2013 (UTC)[reply]
Approved for trial (30 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Anomie⚔ 00:04, 17 October 2013 (UTC)[reply]
{{OperatorAssistanceNeeded}} Has this trial taken place? Josh Parris 11:13, 5 November 2013 (UTC)[reply]
Trial complete. I made a batch of edits, they are last in contributions (along with earlier trial and previous incremental task upgrades before I decided this should be a full BRFA). Here's a good example of massive url misuse and lost original urls. Also found a blacklisted url. — HELLKNOWZ ▎TALK 11:32, 5 November 2013 (UTC)[reply]
- For prosperity, a permalink to the edits Josh Parris 12:08, 5 November 2013 (UTC)[reply]
- The URL injected in this edit broke the wikitext, you'll need to do some additional escaping for certain characters that might be used in URLs but that MediaWiki doesn't recognize (
/[][<>"\x00-\x20\x7F\p{Zs}]/
is what MediaWiki doesn't recognize). I also see in a few of the earlier edits (e.g. [1], [2], [3]) the URL was present but in a misnamed parameter. - Anyway, since all that seems rare and easy to fix and I have confidence you will fix them, Approved. Anomie⚔ 20:23, 5 November 2013 (UTC)[reply]
- The URL injected in this edit broke the wikitext, you'll need to do some additional escaping for certain characters that might be used in URLs but that MediaWiki doesn't recognize (
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.