Wikipedia:Bots/Requests for approval/Archivedotisbot

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Request Expired.

Archivedotisbot

Operator: Kww (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:29, Saturday May 10, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP (based on Chartbot's existing framework)

Source code available:

Function overview: Removal all archival links to archive.is (and its alias, archive.today)~~, which was put in place to bypass the blacklist)~~

Links to relevant discussions (where appropriate): WP:Archive.is RFC, WP:Archive.is RFC 2, WP:Archive.is RFC 3, MediaWiki talk:Spam-blacklist/archives/December 2013#archive.is, Wikipedia:Administrators' noticeboard/Archive261#Archive.is headache

Edit period(s): One time run, with cleanups for any entries that got missed.

Estimated number of pages affected:

Exclusion compliant (Yes/No):

Already has a bot flag (Yes/No):

Function details:

Remove "archiveurl=" and "archivedate=" parameters whenever the archiveurl points at archive.is or archive.today.

Amended description in response to comments below.

The bot cannot implement the RFC result and keep links to archive.is. However, to help prevent deadlinking issues, the bot will take two steps:

When removing a link from an article, the bot will add a talk page notice of the form "Archive item nnnn from archive.today, used to support <url>, has been removed from this article".
A centralised list of all removals will be maintained at User:Archivedotisbot/Removal list.

—Kww(talk) 16:52, 16 May 2014 (UTC)[reply]

Discussion

Comment There is no direct connection between the existence of the links and the blacklisting of the archive.is site. Most of the archive links were put there in good faith. As archive.is performs a unique function, the proposer will need to demonstrate the links themselves are actually in violation of policy, and that any given archive is replaceable – meaning the bot ought to be capable of replacing the links with one on another archive site, particularly where the original referring url has gone dead. Non-replacement will lead to diminution of verifiability of citations used. -- Ohc ^¡digame! 01:20, 12 May 2014 (UTC)[reply]

Leaving the links in place wouldn't correspond to the RFC consensus, and having the links in place while the site is blacklisted makes for a painful editing experience.—Kww(talk) 01:31, 12 May 2014 (UTC)[reply]
Blacklisting does not distinguish good-faith edits. Welcome to the alternate universe of the MediaWiki talk:Spam-whitelist. Will the bot honor the whitelist? If so, we should get some links whitelisted before trial so that functionality may be tested. See MediaWiki talk:Spam-whitelist/Archives/2014/03#archive.is/T5OAy. This should be done before the bot runs, to avoid any discontinuity of referencing, as the whitelist approval process can take months to come to consensus. – Wbm1058 (talk) 01:58, 12 May 2014 (UTC)[reply]

Does anybody keep track of all the archive links they place? I can guess but I can never be sure. If a bot is approved, removals of potentially valid and irreplaceable (in some cases) links will be the default scenario unless all editors who consciously used the site come forward with their full list. I fear that even if I whitelisted all the articles I made substantial contributions to, that list would be incomplete. Then, some links I placed will inevitably get picked off by the bot. -- Ohc ^¡digame! 04:30, 12 May 2014 (UTC)[reply]

I have to reject the timing and implication of this request at this time on a couple of key grounds. Archive.today was not made to bypass the filter. There is no evidence that Archive.is operated the Wiki Archive script/bot. The actual situation was resolved by blocks, not the filter - the filter was by-passable for a long time. Kww made a non-neutral RFC that hinged on perceived use as ads, malware and other forms of attack - without any evidence nor any realization of any of these "bad things" would ever or be likely to occur. Frankly, the RFC was not even closed by an admin and it was that person, @Hobit:, that bought into the malware spiel and found Archive.is "guilty" without any evidence presented. Also, this is six months later, if that's not enough reason to give pause - I'll file for a community RFC or ArbCase on removing the Archive.is filter all the quicker. Back in October 2013, I'd have deferred to the opinion then, but not when thousands of Gamespot refs cannot be used because of Archive.org and Webcite's limitations and Kww seems deaf to the verifiability issues. Those who build content and maintain content pages need Archive.is to reduce linkrot from the most unstable resources like GameSpot. ChrisGualtieri (talk) 04:49, 12 May 2014 (UTC)[reply]

I will simply point out that your arguments were raised and rejected at a scrupulously neutral RFC that was widely advertised for months.—Kww(talk) 05:00, 12 May 2014 (UTC)[reply]

False, I wasn't even a part of the RFC. Also, the malware and illegal aspect were repeatedly pushed without evidence. ChrisGualtieri (talk) 16:29, 12 May 2014 (UTC)[reply]

I didn't say that you had participated: I said that your arguments had been presented. The framing of the RFC statement was scrupulously neutral. Arguments were not neutral, but such is the nature of arguments.—Kww(talk) 16:50, 12 May 2014 (UTC)[reply]

Can you prove, with firm evidence, that archive.today was created to "bypass the blacklist"? That domain has existed for months, and during this time, an attacker could have spilled a mess all over Wikipedia, but this has not occurred. Currently, archive.is does not exist (just try typing in the URL), it redirects to archive.today which is the current location of the site. A website may change domains due to any number of legitimate reasons, ranging from problems with the domain name provider, to breaking ccTLD rules. --benlisquare_T•C•E 06:04, 12 May 2014 (UTC)[reply]

Struck the language expressing cause and effect, and simply note that archive.is and archive.today are the same site.—Kww(talk) 06:13, 12 May 2014 (UTC)[reply]

I did close the RfC and am not an admin. I closed the discussion based upon the contributions to that RfC. There was no "Guilty" reading. Rather it was the sense of the participants that archive.is links should be removed because there was a concern that unethical means (unapproved bot, what looked like a bot network, etc.) were used to add those links. I think my close made it really really clear that I was hopeful we could find a way forward that let us use those links. If you (@ChrisGualtieri:) or anyone else would like to start a new RfC to see if consensus has changed, I'd certainly not object. But I do think I properly read the consensus of the RfC and that consensus wasn't irrational. On topic, I think the bot request should be approved--though if someone were to start a new RfC, I'd put that approval on hold until the RfC finished. Hobit (talk) 18:05, 12 May 2014 (UTC)[reply]
Comment. An unapproved(?) bot is already doing archive.is removal/replace: [1] 77.227.74.183 (talk) 06:18, 13 May 2014 (UTC)[reply]

Im not a bot, so that is completely uncalled for. Werieth (talk) 10:16, 13 May 2014 (UTC)[reply]

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

I see you work 24/7 and insert high amount of unreviewed links like a bot ([2], [3] in Barcelona).

I call you a bot. 90.163.54.9 (talk) 13:03, 13 May 2014 (UTC)[reply]

I dont read Chinese, and it looks like a valid archive. Not sure what the issue is. comparing http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm and its archive version http://www.webcitation.org/684VviYTN the only differences Im seeing is its missing a few images, otherwise its the same article. Werieth (talk) 13:11, 13 May 2014 (UTC)[reply]

The first page has only frame and misses content, second has only a server error message. No human would insert such links. I also notices that you inserted many links to archived copies of youtube video pages, which is nonsense.

You should submit a bot approval request (like this one), and perform a test run before run your bot at mass scale.

Only the fact that in the same transaction you removing archive.is links prevents editors to undo your edits. Otherwise most of your edits would be reverted. 90.163.54.9 (talk) 13:14, 13 May 2014 (UTC)[reply]

Not sure what your looking at but http://www.webcitation.org/684VviYTN looks almost identical to http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm. The only two differences I see is that the archive is missing the top banner, and the QR code at the bottom. As I said Im not a bot and thus dont need to file for approval. Werieth (talk) 13:21, 13 May 2014 (UTC)[reply]

Forget 684VviYTN, it was my copy-paste error, which I promptly fixed. There are 2 other examples above. 90.163.54.9 (talk) 13:24, 13 May 2014 (UTC)[reply]

taking a look at http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ vs http://web.archive.org/web/20131113091734/http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ it looks like a snapshot of how the webpage looked when it was archived and the page is dynamic. There is one part of the page that appears to be dynamically generated via JavaScript that appears partially broken in the archive but most of the page content persists and is better than not having any of the content if the source goes dead. Instead of complaining about my link recovery work why dont you do something productive? Werieth (talk) 13:36, 13 May 2014 (UTC)[reply]

Productive would be to undo your changes and discuss in public the algorithms of your bot, but it is impossible because you intentionally choose pages with at least one archive.is link and thus you do abuse the archive.is filter making your unapproved bot changes irreversible. Also, you comment those changes as "replace/remove archive.is" albeit 90% of the changes you made are irrelevant to archive.is. 90.163.54.9 (talk) 15:11, 13 May 2014 (UTC)[reply]

Oppose, RFC was non-neutral, bias, and not widely advertised despite Kww's claims, this is obvious from the number of editors who say they had no knowledge of the discussion while clearly being opposed to its outcome. DWB / Are you a bad enough dude to GA Review The Joker? 08:02, 13 May 2014 (UTC)[reply]

DWB:It's hard to give much weight to an argument based on a falsehood. It was placed in the centralized discussion template on Sept 13 and not removed until Oct 31. That you personally missed a discussion doesn't invalidate a discussion. The framing of the question was scrupulously neutral.—Kww(talk) 14:44, 13 May 2014 (UTC)[reply]

And yet so many users say they were not aware of it, of course we are all lying. The RFC was based on the premise "Archive.is does what it is meant to but I am going to accuse it of things I cannot prove, and also one user is adding a lot of their links so we should block it". It was not advertised nor neutral. DWB / Are you a bad enough dude to GA Review The Joker? 19:56, 13 May 2014 (UTC)[reply]

It was advertised in the standard places for 45 days. The RFC question presented three alternatives, the first of which was to leave existing links in place, the second of which was to restore all the links to archive.is that had already been removed, and the third (which gained consensus) was to remove them all. That's about as neutral as you can get, and more widely advertised than normal. That your side did not prevail doesn't mean a discussion is flawed, it simply means that it reached a conclusion that you disagree with.—Kww(talk) 20:04, 13 May 2014 (UTC)[reply]

Question I assume the bot, if approved, will replace, not remove the archives? Will it properly include an edit summary? Thank-you. Prhartcom (talk) 14:12, 13 May 2014 (UTC)[reply]
- Good point. I'm not sure there _are_ replacements, though following the AN thread, it looks like a new archive tool is potentially available. So it certainly should be replacing them where possible. In addition, an edit summary which explains what's going on (ideally with a link to a more detailed explaination) should be required (and trivial I'd think). Hobit (talk) 14:42, 13 May 2014 (UTC)[reply]

User:Kww, what is your answer to this? Prhartcom (talk) 17:28, 13 May 2014 (UTC)[reply]

Easy enough to build a centrally accessible list of what was removed and where it pointed. Finding good replacements is not readily automated.—Kww(talk) 17:35, 13 May 2014 (UTC)[reply]

Agreed that it would be a difficult job and that the bot may have to be semi-automated run by an operator making human decisions as it runs in order to get the archives accurately replaced. Obviously you don't want to simply delete archives and have people mad at you, you want to replace them, achieving the archive.is purge goals as well as link rot prevention goals. I wish you the best of luck with it. Prhartcom (talk) 17:45, 13 May 2014 (UTC)[reply]
Oppose Kww, From your comments in this discussion it doesn't sound like you are interested in the goal of preventing link rot, but only your goal of purging archive.is, so I cannot let you proceed with damaging Wikipedia. Replace not remove. Prhartcom (talk) 21:26, 14 May 2014 (UTC)[reply]

*Oppose Per DWB. Duke Olav Otterson of Bornholm (talk) 15:13, 13 May 2014 (UTC)Blocked as sock.—Kww(talk) 15:45, 13 May 2014 (UTC)[reply]

- Sock of who? Has that user posted here? If not the socking is not abusive and the !vote stands. All the best: Rich Farmbrough, 13:13, 14 May 2014 (UTC).
  - It's an undisclosed alternative account, and is not permitted to participate in community discussions. The block has been upheld by another admin.—Kww(talk) 18:47, 14 May 2014 (UTC)[reply]

Statement: I will make a general comment to whoever closes this thing: this should not be a forum for people that did not prevail at an RFC to attempt to undermine the result. That isn't what a BRFA is about. The RFC had a conclusion, and I am requesting approval to run a bot to implement that conclusion.—Kww(talk) 15:39, 13 May 2014 (UTC)[reply]

Oppose To quote Hawkeye "Do not use a bot to remove links. Per Wikipedia:Archive.is RFC: the removal of Archive.is links be done with care and clear explanation. " Moreover the blocking of The Duke by Kww makes me doubt that Kww is in a good place to run a bot on such a contentious issue. Further the recent discussion was hardly consensual for removing the links, the more time that goes past, the less likely is it that archive.is is abusive as claimed. All the best: Rich Farmbrough, 13:13, 14 May 2014 (UTC).
Oppose I also endorse Rich Farmbrough's view that the RFC was quite clear. The solution is to manually re-find the URLs (or to hunt down replacements at other mirrors) for the ones that have been corrupted by the archiving service (much the same way that I did List of doping cases in sport and it's newly minted subchildren). Removing the parameters outright violates the consensus established at the RFC, and individual editors editing 24/7 to replace these suggests a form of automation and not manually fishing the appropriate archives. Hasteur (talk) 17:18, 14 May 2014 (UTC)[reply]

Rich, Hasteur: the language in the RFC closing is quite clear:"There is a clear consensus for a complete removal of all Archive.is links.". Hawkeye's opinion distorts the RFC closing statement, and does not reflect the actual content of the RFC. The care called for is explicit: "To those removing Archive.is from articles, please be sure to make very clear A) why the community made this decision and B) what alternatives are available to them to deal with rotlink." Not replacement. Not exhaustive searching for alternatives. Again, the purpose of an BRFA is not to provide people that disagree with an RFC an alternate venue to restate their opposition.—Kww(talk) 18:47, 14 May 2014 (UTC)[reply]

Kww you might want to check the RFC again and check your prejudice at the door. I did support removal, but controlled removal to where we don't instantly deadlink the reference by bulk removing archive.is. I'm not attempting to overturn the previous consensus, I am only saying that botting this is not endorsed. Hasteur (talk) 19:04, 14 May 2014 (UTC)[reply]

It doesn't matter what opinion either of us expressed in the RFC, Hasteur. That's not what the closing statement says. It says that the consensus is to remove them all, and there was no consensus for the level of research that you are demanding prior to removal.—Kww(talk) 21:03, 14 May 2014 (UTC)[reply]

I haven't been following the entire discussion about this issue, is there a "tl;dr" somewhere? Is this bot task planning to remove all archive.is links, with the goal that enwp will stop linking to that site as a whole? Or is this just a "cleanup" run to remove all the spammed links. It would be nice if rather than removing, we could convert them to IA links, but that will probably just be a dream ;) Legoktm (talk) 08:20, 15 May 2014 (UTC)[reply]
- @Legoktm: It's in the functional details (Remove "archiveurl=" and "archivedate=" parameters whenever the archiveurl points at archive.is or archive.today.). It means that if the base url is gone, we instantly deadlink the referernce. I observe that it's transcended basic disputes and upholding the consensus and gone to the level of "Cutting off the nose to spite the face" tactics to obliterate links to the offending website. Hasteur (talk) 12:51, 15 May 2014 (UTC)[reply]
- TLDR version for Legoktm: the sole intent of the bot is to remove every reference to archive.is from English Wikipedia. That was the consensus at WP:Archive.is RFC, so that's what the proposed bot would do. Once proposed, editors that did not prevail at the RFC have taken this opportunity to oppose the bot, many of them presenting distorted versions of the RFC close to support their position. If you look above at Hobit's position, you will see that the closer of the RFC agrees that the bot implements the consensus of the RFC. I maintain that that is the sole criteria by which this BRFA should be judged, and all the conversation above is completely irrelevant to the discussion. The question being asked is "does the bot implement the RFC?" not "does the commenter agree that links to archive.is should be removed?"—Kww(talk) 15:08, 15 May 2014 (UTC)[reply]
  - You are correct, though it is perfectly reasonable for those who opposed the removal by any means, to be against the removal by bot, even if they would have supported bot-removal for some other hypothetical links whose removal they supported. Otherwise a system of regression is in place which allows a tyranny of the minority, namely the minority that asks the questions. All the best: Rich Farmbrough, 20:06, 15 May 2014 (UTC).
    - The point is that the people that oppose the bot based on "I don't think the RFC should have generated the result that it did" should have their !votes discarded by whomever closes this thing. There are venues to discuss such things, and BRFA isn't one of them.—Kww(talk) 20:37, 15 May 2014 (UTC)[reply]
      - Kww Please don't strong arm the process like this by trying to use *fD nomenclature like !vote. There are 2 people (myself and Rich Farmbrough) who oppose the bot for completely seperate reasons besides "I don't think the RFC should have generated the result that it did". I am asking that the bot be rejected on the grounds that the deadlinking you propose is more disruptive than a controlled replacement of the links. Hasteur (talk) 20:51, 15 May 2014 (UTC)[reply]
        I'm not strongarming the process at all, Hasteur. The RFC result did not call for leaving links in place when replacements could not be found. It did not call for diligent searching for replacements prior to removal. It called for complete removal of links. The alternative you are attempting to hold the bot to did not gain consensus. I'm quite willing to entertain enhancements such as creating a centralized list of removed links or leaving talk page notices indicating what links have been removed, but I'm not willing to entertain leaving the links in place: that would run counter to the RFC result.—Kww(talk) 21:34, 15 May 2014 (UTC)[reply]
        
        Indeed, and that is perfectly legitimate, if frustrating way to !vote. If it were not, for example, we could get the following situation.
        Scenario 1: kww asks for a BRFA to remove .is links. Vote Yes, 26 % No, 74%. (say 25% think the links are good, 24% think that all bots are evil and 25% think it should be done manually)
        
        Scenario 2: RFC - passes 75%, BRFA, passes 51%.
        
        Clearly this process would be anti-consensus. Equally clearly, by extending the process with sufficient stages, and suitably worded alternatives any conclusion could be reached.
        
        All the best: Rich Farmbrough, 11:02, 16 May 2014 (UTC).

Kww doesn't seem to understand the opposition has nothing to do with "not prevailing" - like this is a trial and we are opted by some "law" to abide it. No part of the RFC was neutral or balanced - despite kww's assertions otherwise. People in the first RFC were under the impression it was all done by a bot - it did not balance the contributions of other editors or even discuss that fact in its opening. The arguments and its closing were highly ambiguous, but its been more than SIX months and much has transpired in that time. I read the RFC as to remove the Bot-added links - not the whole and the close (supervote or not) did not establish a blacklist - but a blacklist was made and the Bot-added links were not purged as was the expected result. Now we are calling for the complete removal of the entire website based on allegations, malware fears, and the acts of a single user all while knowing there are no actual issues with the additions, the website or content displayed itself. And just to top it off, as if it wasn't enough, all in the name of a flawed non-admin closed RFC that took more than six months and a much larger discussion to provoke this attempt to complete an expanded and derived reading as if the last six months (and the blacklist not functioning) never happened. Though it seems consenus can change and it has. ChrisGualtieri (talk) 17:30, 18 May 2014 (UTC)[reply]

First, your reading of the RFC is irrelevant: it was closed, and the closure was never overturned (or even challenged, for that matter). Second, if you believe that the formulation of the RFC was non-neutral, can you at least indicate what part of the original framing was non-neutral? I certainly cast my opinion in one of the discussion sections, but the framing of the circumstances was scrupulously neutral.—Kww(talk) 19:26, 18 May 2014 (UTC)[reply]
- One correction, it was certainly challenged. I think both on my talk page and either AN or ANI. In any case, ChrisGualtieri it seems wise to open a new RfC if you feel the last one was defective (other in Kww's wording or my close) or because CCC. I'm not sure why you haven't done that if you feel there were so many problems. As the closer, I've made it pretty clear I'm comfortable with a new RfC. Heck I'd even be happy to work with you on neutral wording or whatever else might be helpful. I think I closed it correctly and I don't think it was ambiguous--if some part was please let me know and I'll clarify. But I think enough time has passed that a CCC argument is a perfectly good reason to start a new RfC on the topic--I'd not be suprised if you were correct and consensus has changed. Hobit (talk) 22:35, 18 May 2014 (UTC)[reply]
  - We are moving in that direction. Let's talk on your page about an issue or two before a new RFC is made. ChrisGualtieri (talk) 04:05, 19 May 2014 (UTC)[reply]
    - Certainly you could spare a moment to actually identify a specific item in the old RFC that would support your accusations of it being biased. Or is it easier to disrupt this discussion by simply making accusations without supporting them?—Kww(talk) 04:35, 19 May 2014 (UTC)[reply]

Kww has no intention of even lifting the edit filter or discussing the details of it publicly. The data in question shows one bad user and many good users who added Archive.is links and Rotlink was not being operated by Archive.is. Allegations of illicit activity, bot nets and false identity that requires the complete nuking of a site on the basis of someone who's data doesn't even trace to Archive.is is a pretty poor excuse to punish the whole on the grounds of some boogieman. The RFC did not even recognize the good editors who added those links in the first place. It wasn't neutral and it did not even give fair representation to the two users who prominently declared that it would negatively impact their editing. The simple solution was ignored for the sake of preventing or removing the whole. Six months is far too late to suddenly spur the removal because someone disagrees with you. Kww made blind accusations and couldn't support them, but even the lengthy discussion into how those were unsupported did not deter the non-admin closer from a straw count of the !votes despite the entire premise being unsupported by the conclusion. The entire thing hinged on unsupported allegations of illicit activity, malware and that Rotlink was Archive.is, despite evidence to the contrary. I see absolutely no value in a "consensus" rooted in false pretexts, numerous users have made key arguments and Kww has brushed them off without answering them. I cannot support this bot because it represents a hail mary some six months after the fact and rooted in a direct opposition to the edit filter's very existence. ChrisGualtieri (talk) 22:57, 22 May 2014 (UTC)[reply]

BRFA doesn't deal with edit filters. If you're concerned about that, take it to WT:EF or WP:AN. Legoktm (talk) 04:34, 31 May 2014 (UTC)[reply]

Comment Too much power in the hands of one user? Kww already has abusefilter. And that role is sure to lead to hardened views, not patience and neutrality. Kww even had to edit the BRFA to remove unsubstantiated claims. --{{U|Elvey}} ^(t•c) 18:32, 9 September 2014 (UTC)[reply]
Comment Certainly not appropriate at the moment given there's an open RFC about it. --{{U|Elvey}} ^(t•c) 18:32, 9 September 2014 (UTC)[reply]

Why shouldn't this be dealt after the RFC? If you're concerned with power abuse, go to WP:AN.Forbidden User (talk) 10:46, 11 September 2014 (UTC)[reply]

Confused. Are you agreeing with me, trying to give me instructions, or both? I've expressed concerns with power abuse, and expect the closer of this RFC to consider the concerns. No, I'm not going to go to AN just because you tell me to. Especially with a username like that.--{{U|Elvey}} ^(t•c) 07:12, 16 September 2014 (UTC)[reply]

LOL, I originally wanted to call myself Blocked User to avoid the FU acronym. Another person told ChrisGualtieri (not that MP) to do so. If you cannot trust me, you may consider that. I think the process should be restarted in a new section, where the bot is assessed with the two RFC's consensus in mind.Forbidden User (talk) 16:56, 18 September 2014 (UTC)[reply]

Break

So, from my reading most of the opponents of the bot don't agree with the RfC's closure, and believe that the RfC itself is invalid. If that's the case, arguing here is pointless. BRFA can't overturn the closure of an RfC. WP:AN or WP:VPP are the places to do that, but Hobit says that that already happened (I didn't look for links), so IMO the closure is valid, and consequently that's not a reason to block approval of the bot. Legoktm (talk) 04:34, 31 May 2014 (UTC)[reply]

Sorry I am late to the discussion, as the original RfC did not catch my attention at the time. The fact that advice to use this service was not removed from Wikipedia:Link rot until 19 March 2014 and not explicitly prohibited until 20 March 2014 didn't help with increasing awareness. I have some questions. Please feel free to add specific answers under each. Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]

As for the March issue, anyone that has tried to add an archive.is link since October 2013 has had his edit blocked as a result. I would think that would be notice enough.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]

This is proposed to be a one-time run. What are the plans for dealing with such links that may be added after the one-time run? Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]
There's still a filter that prevents additions of archive.is and archive.today, and that filter will remain in place until the blacklist is implemented.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]
I'm confused. Here is a diff showing an archive.today reference link addition, added 21 April 2014. It doesn't seem to have been blocked, and it's not for a video game review. Wbm1058 (talk) 17:47, 31 May 2014 (UTC)[reply]
The filter hadn't caught up with the name change from archive.is to archive.today. Now it has.—Kww(talk) 21:14, 31 May 2014 (UTC)[reply]
I see. Editors clicking on Show preview after inserting <ref>http://archive.today/xxx</ref> into the edit box see MediaWiki:Abusefilter-warning-archiveis—created 16 November 2013, in response to this 15 November 2013 edit request, which in turn was a response to this discussion, which happened a month after this request‎ was not responded to—above their edit box. This warning, which I believe uses template:edit filter warning, is triggered by Special:AbuseFilter/559, the details of which are hidden from public view. I also note that Special:AbuseLog/xxx admits that "Entries in this list may be constructive or made in good faith and are not necessarily an indication of wrongdoing on behalf of the user" in spite of its name AbuseLog which presumes wrongdoing. I also note that attempts to make such edits are logged behind the editors back without informing them that this is happening, even though no edit was actually saved (was the attempted edit saved?). Be careful who you love ;-) This also begs the question: if an edit filter was installed by 25 October 2013, what was the point of requesting blacklisting in December, which seems redundant. Would I be correct in assuming that any administrator can view the filter, and the the filter is nothing more complicated than looking for <ref>http://archive.today/xxx</ref> or any alias(s) for that site? Will the bot use the same search criteria as the filter? Or use Special:LinkSearch/archive.is and Special:LinkSearch/archive.today? Wbm1058 (talk) 15:28, 1 June 2014 (UTC)[reply]
The edit filter is not nefarious, but yes, it is hidden, and no, I will not discuss the details of exactly what it takes to ensure that archive.is and archive.today links are not inserted. Blacklisting is our standard technique, not filters: the filter was installed on an emergency basis because of the attacks. The proposed bot will do exactly what is advertised: look for 'archiveurl=" parameters and remove them. That should bring our count down low enough that blacklisting the site and removing the filter is feasible. If I find that someone has undertaken a conscious effort to defeat the bot by fiddling with the parameters, I will probably tweak the bot to get past such things.—Kww(talk) 15:39, 1 June 2014 (UTC)[reply]
OK. So links like the one added in this edit, which I linked above, will not be removed by this bot, because they are not inside templates using the archiveurl=n parameter? That makes sense, if the alleged botnet restricted their additions to those using such citation templates. Wbm1058 (talk) 17:27, 2 June 2014 (UTC)[reply]
The blacklist will force removal of such links, but the bot will not. Such links are a small enough percentage of the overall problem that they can be deleted manually without bot assistance.—Kww(talk) 18:05, 2 June 2014 (UTC)[reply]
What is the status of mw:Extension:ArchiveLinks? See mw:Extension talk:ArchiveLinks, where it was proposed that "archive.is should be added". Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]
Your link is to a question that preceded the problems with archive.is.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]
See also m:WebCite. Ok, so I'm not familiar with the m:New project process and how it may differ from WP:RFC, but why hasn't this been closed yet, with a determination of whether or not there is a consensus for it? Are there some citation backup features provided by Archive.today (Wikipedia:Articles for deletion/Archive.is) that WebCite doesn't offer? Wbm1058 (talk) 17:27, 31 May 2014 (UTC)[reply]
Is link rot really a problem? If, per Wikipedia:Link rot#Internet archives "archive.is not permitted on the English Wikipedia", from that does it follow that whitelisting specific archive.is links is not permitted either? Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]
From my perspective, it's a real problem, but not as large as people make it. Replacement links tend to be available for important information. Note that some of the loudest objectors here are eager to use archive.is because it properly archives pages from one of the video game review sites.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]
There are many Web archiving initiatives. Are all of these except archive.is legal? If so, can the bot crawl all of them in search of alternatives? Perhaps Mementos can be used to make this task easier? How can we be assured that none of these alternatives will not potentially have the same issues as archive.is at some time in the future? Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]
Automatic archival replacement bots have been tried before, and have invariably failed. I've agreed above to keep a master list of all removed links. If people want to manually deal with the list or write scripts that assist people in finding replacements from the list, I view that as a separate task.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]
Can I look at the source code for this bot? Wbm1058 (talk) 15:37, 31 May 2014 (UTC)[reply]
This is just a simple variation on Chartbot, an approved bot that dealt with the last revamp of Billboard (a revamp that left us with tens of thousands of dead links). I've never released the source. I could, but I would like to know why you want to see it.—Kww(talk) 16:04, 31 May 2014 (UTC)[reply]

closure

I have closed the discussion per WP:IAR. Whether the RfC was neutral is not an issue; This is a BRFA, so can not alter the result. However, reading through the discussion, there is clearly not a consensus to run the bot, mainly due to concerns about legitimate archive links being removed and never replaced. I am aware I am not a member of the BAG, however this has sat here for several days, and frankly, there is no point keeping a stale discussion open when there is a clear consensus evident. --Mdann52talk to me! 10:23, 18 June 2014 (UTC)[reply]

Closure reverted by bot operator after discussion failed on my talk page. --Mdann52talk to me! 13:46, 18 June 2014 (UTC)[reply]

The problem with the close was both procedural (Mdann52 is not a BAG member) and substantive: the objections raised here are not valid objections to raise at a BRFA and carry no weight. While raw numbers certainly weigh against this bot, once one discards the arguments of "I wish the RFC hadn't said to do this", the bot would be approved.—Kww(talk) 13:49, 18 June 2014 (UTC)[reply]

The closure was inappropriate. At the very least, a BAG member should have done it. TBH, I haven't seen any objections since #Break. Legoktm (talk) 18:38, 18 June 2014 (UTC)[reply]

Then let me be the first to object blatantly to this bot's currently stated goal and the heavy handedness the potential bot operator has demonstrated throughout this process (in violation of BOTCOMM). While I do agree that the RfC is correct, this is not the way to "do no harm" to wikipedia by directly orphaning links. Hasteur (talk) 20:00, 18 June 2014 (UTC)[reply]

Then feel free to demonstrate how removing a link and placing a message on the article's talk page and in a central list that indicates what link was removed from where so that others can undertake the manual process of restoring it contravenes the RFC.—Kww(talk) 20:52, 18 June 2014 (UTC)[reply]

Your continual pestering, refusing to discuss the underlying issue, refusing to discuss the edit filter, and your continual pushing against editors explicitly expressed viewpoints is clear that you're completely the wrong person to be implementing this bot. The RFC is quite clear "I also suggested that the removal of Archive.is links be done with care and clear explanation to editors and suggested the folks at Archive.is work with the community to find a way forward" Your solution is taking a airplane dropped napalm on a forest rather than individually weeding out each invasive plant. Hasteur (talk) 23:43, 18 June 2014 (UTC)[reply]

I've been frustrated, certainly, but I think that's a pretty unfair accusation. I've addressed the underlying issues. I haven't disclosed details of the edit filter, but I've been upfront that it exists and what it accomplishes. What I've "pestered" is the notion that "remove all links" means "don't remove any links unless you can find a satisfactory replacement for it before you remove it." I listened to pushback and agreed to track all of the removed links in a central location and provide a message on the talk page of each article edited detailing the links that were removed. That's certainly "with care and clear explanation".—Kww(talk) 00:30, 19 June 2014 (UTC)[reply]

Ok, you don't like the characterization, here's a explicit list of problems

You refuse to post the source code so that other editors can verify that the bot is doing what it says it will do. Posting the source code would allow us to verify that the actions conform to the request.
Your solution is to delete all the offending archive links outright and locate all the references into a central location. This causes more harm than good. A much better solution would be to write a bot that investigates each archive link and replaces it with better archives. That would be taking care and not damaging wikipedia to win the war against Archive.is.
Your lack of transparency across all aspects of this war (not explaining how the edit filter works, not letting us see the source code for the bot, blocking editors who express a viewpoint against your desired outcome, Slaping SPI blocks on without the formality of a sockpuppet investigation or significant duck quacking) is yet annother case where your involvement has actually dialed up the drama regarding this subject than diffuse it.

For these reasons, you are most definitely not the person to be undertaking this bot task. Hasteur (talk) 12:30, 19 June 2014 (UTC)[reply]

I have never posted the source for Chartbot, either. I don't post the source for anything I write.
People have tried to write fully automatic archive searching bots in the past. No one has ever successfully done so. They always wind up being semi-automatic, requiring an operator to make judgement calls that the bot cannot successfully do.
I don't want people taking advantage of defects in the filter to bypass it. That's why it's a private filter. Ask any admin with filter privileges and they will verify the filter is innocuous. Any blocks I have made have been upheld. I rarely open SPI investigations, and never for such obvious socks as Duke Olav.

The problem here is that people persist in seeing some kind of bad faith on my part where there is none.—Kww(talk) 13:27, 19 June 2014 (UTC)[reply]

Question: Kww Since you are advocating for the complete obliteration, would you be willing to put this task on hold to let me write a bot that goes through and tries finding appropriate replacements at webcite/Internet Wayback? As evidenced by [4] there are still plenty of archive.is urls that can be successfully replaced. That it's festered for so long after the RFC closed is indicative in my mind that we can afford the time to do this right rather than go straight for the delete first. Removing the Archive.is urls makes these articles difficult to find, whereas we can use the link search to find all the places it is used and properly deal with fixing them. Hasteur (talk) 12:55, 19 June 2014 (UTC)[reply]

Why not write a bot that reads the list that my bot creates and aids a user in selecting the appropriate alternative archive or creating an archive itself? There's no reason at all that a bot like that needs to have the offending links in place to operate.—Kww(talk) 13:27, 19 June 2014 (UTC)[reply]

Because the first thing you do is BREAK wikipedia. How many times do we need to say this to get it into your head that this a complete non-starter. By deleting the archive.is links you force random editors who want to solve the problem to have to hunt down your bot's list, whereas by leaving them we have the Link searcher to find them. As to not wanting to publish how things work, I refer you specifically to Linus's Law as to why your "I don't want to share" is a huge gaping hole. Hasteur (talk) 13:52, 19 June 2014 (UTC)[reply]

The RFC result which was to remove all links to archive.is. Without qualification. Without reservation. Without requiring replacement first. The consensus was not to leave all links in place until some indeterminate future date when people get around to dealing with them. Do you have any objections that would not require that the RFC had come to a different consensus than it actually did? That's the reason you see me as "pestering": you would disagree with any implementation of any bot that implemented the RFC result.—Kww(talk) 14:10, 19 June 2014 (UTC)[reply]

(edit conflict)Heck, I'll even pseudocode the 1st pass proactive fixer for you to show how easy it is

For each page listed in the LinkSearch query

Use a regex to find all the cite templates that contain a reference to archive.is

For each hit in the regex

Extract the URL for the referenced item

See if Internet Wayback has a hit for the page

See if WebCite has a hit for the page

If neither have a hit, try submitting the page to Internet Wayback

If Internet Wayback fails, try submitting the page to WebCite

If either Wayback or WebCite got a good hit, replace the archive.is url with the Wayback/WebCite url.

If neither got a hit, then do nothing with the citation template

If at least one change happened, append a hidden maintenance category to the page for human checking of the page (i.e. exercising extra care)

See It wasn't that difficult to pseudocode the careful replacement of the archive links in addition to providing a quick way for us to verify that the replacements are good. Hasteur (talk) 14:20, 19 June 2014 (UTC)[reply]

Kww, since you seem hell bent on implementing over the will of the community, go right ahead but be prepared to have a edit war over every last change this bot makes as causing a loss of information is vandalism. Options have been presented for how you could minimize the loss, but your stubbornness to minimize the loss indicates to me that this solution is not endorsed. Hasteur (talk) 14:20, 19 June 2014 (UTC)[reply]

Devil's in the details. Sure, I can write pseudocode to do anything. Now actually deal with writing real code that looks for the date of the archive.is archival, compares it to the date available in Wayback, and then determines whether the archive in Wayback archives the intended data. If that fails, figure out whether the content that the archive is intended to preserve is still presently in the page, because submitting today's webpage content for archival may do nothing to help with the content of a page that was archived 18 months ago. It's a manual job to be done with the assistance of a script by interested editors. It's beyond the scope of any bot written in actual code as opposed to pseudocode. I've presented a method for minimizing the loss, with every removed entry doubly transcribed, both in a central archive for people that want to make a little Wikiproject out of rearchiving and on each individual talk page for people that are only interested in each individual link. If you want me to replace the actual removed parameter with a wikicomment about the removal, I could do that to. I'm working in good faith here to assist with any effort to rearchive links, but I don't think the restoration can reasonably be put in front of the removal: it's an open-ended task that will take years to complete.—Kww(talk) 14:32, 19 June 2014 (UTC)[reply]

It looks like no one is supporting your revert of the closure here, Kww. To me, legal issues seriously threaten Wikipedia. If anyone/any bot constructs links from Wikipedia to blacklisted sites, then this should be prohhibited without question. As there is no consensus this should really be closed.Forbidden User (talk) 14:25, 28 June 2014 (UTC)[reply]

Forbidden User, could you explain what you mean? Kww isn't proposing to add links to blacklisted sites, but to remove some links to blacklisted sites. --Stefan2 (talk) 16:17, 28 June 2014 (UTC)[reply]

Yeah I was confused. I meant to refer to the one who does not support removal of archive.is links by this bot. Sorry, Kww.Forbidden User (talk) 16:56, 17 July 2014 (UTC)[reply]

In case the bot operator is still watching the page, I'd suggest putting info about the RfC and other archive sites (like providing a link to web.archive.org) so as to satisfy A) why the community made this decision and B) what alternatives are available to them to deal with rotlink. Ideally the link can lead directly to the search result for archives of the now-dead link. Here I give my support on the issue. By the way, though WP:PNSD has been demoted to essay, I don't think the number of "votes" matters more than the quality of the arguments. Through the discussion, I can see that most opposing arguments are vague ones like "you are breaking Wikipedia" or "consensus has changed" (read WP:CONLEVEL - "consensus" here cannot trump the RfC). Though Kww has repeatedly stated that this is not a place to discuss whether the links should be removed, there are still editors who refuse to listen, which is undesirable. If people has to stress that there is no consensus, then just open a RfC on the bot.Forbidden User (talk) 17:32, 17 July 2014 (UTC)[reply]

I'd like to add that it is best if the bot scatters its removal edits to different articles and make limited edits (like 500 per day) so that editors can take follow-up measures in time without getting exhausted. It'd be even better if it can build a list of articles containing archive.is links, dividing them into cleaned and not clean articles for patrollers to help!Forbidden User (talk) 15:23, 18 July 2014 (UTC)[reply]

This is something repeated from WP:Archive.is RFC 3. The opinion can be found there.

The following discussion has been closed. Please do not modify it.

I still find that the RFC which was closed by an inexperienced closer and 'NON-Admin carries as much weight on the decision as this. @Hasteur: is completely right on this, the lack of transparency and the commitment to purging Archive.is links has gone far and beyond what can be considered normal operation - its deeply personal and Kww should not be the person to operate such a bot whilst also being the sole implementer and re-activator of the edit filter which was removed by another admin in what appears to be wheel-warring. The RFCs have been tainted with demonstrability false accusations to garner support for removal. Just prior to "Rotlink" was the Web Science and Digital Libraries Research Group's post on the Momento support and tools to one-click archive and export the link (right to the script) for Wikipedia use. I find it laughable that the ease of its implementation and its public release falls exactly in-line with Rotlink and the problem, yet the blame is assigned to the Archive.is operator despite Rotlink taking from Archive.is blog posts and messing up basic information to give legitimacy to the edits. The RFC should have been read as the removal of all bot-added links (easily found) and left the whole of user-added links untouched, but there seems to be quite a bit of Wikipedia:Fait accompli. The rush to remove them so they cannot be re-added and the continued push to strong arm a normally invalid consensus as a final say is bad. Considering this BAG started during a vocal and public questioning of the original RFC - the motive is apparent. I find it ironic that this BAG was closed and reopened by Kww because a BAG member didn't close it yet Kww doesn't accept the return or re-discussing of the original closure Kww found favorable. This BAG is no-consensus and the RFC discussions comments added to this (by extension) serve as a pretty clear rejection, but I'll let till a real member of BAG make the decision. ChrisGualtieri (talk) 05:23, 25 July 2014 (UTC)[reply]

Let's come back after the RfC instead of repeating things mentioned in the RfC. No WP:FORUMSHOPPING, and no irrelevant misinformation.Forbidden User (talk) 06:23, 30 August 2014 (UTC)[reply]

First, prove the accusations to be false. You've flipped a few times on your side when asked for evidence.

Second, you are not the one to say what was/is the consensus. That's utter disrespect to the "non-admin" closer and everyone participating in RFC 1, and let me remind you that you and I are no admins as well (consider why).

Third, you've repeated your dissent to an action not being done a few times at different forums. WP:FORUMSHOPPING does not help an opinion gain support. This'd be my whole reply to this condemnable act.Forbidden User (talk) 09:40, 8 September 2014 (UTC)[reply]

Bot discussion

After the RfC, any BAG members are welcome to go through the bot here. Thank you.Forbidden User (talk) 10:05, 8 September 2014 (UTC)[reply]

Request Expired. RfC was closed a long time ago. This disappeared from the main WP:BRFA page in June 2014, so I'm giving a formal closure for posterity. I do not see this happening as currently formulated. — Earwig ^talk 02:55, 4 December 2015 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.