User talk:GreenC bot/Archive 3

You can stop the bot by pushing the stop button. The bot sees and immediately stops running. Unless it is an emergency please consider reporting problems first to my talk page.

Archives

Links to individual files

I have many Internet Archive links that involve individual PDF, JPEG or MP3 files. Is there any way they can be left as I set them up? The GreenC bot reformats my links so that they become invalid or link to the page rather than the individual file. Is there any way to avoid this? Thank you for your help in this matter.Pfa (talk) 22:23, 11 March 2018 (UTC)[reply]

"invalid links".. example? The bot completed this task: Wikipedia:Bots/Requests for approval/GreenC bot 4 the conversion of machine-specific links to permanent URLs. This is because machine-specific URLs expire - they have a limited lifespan turning into dead links within a few years. There are two ways to create a permalink, either with "/details" or "/download". The JPEG and MP3/MP4 were converted to "/download" and the PDF to the "/details" page which has additional meta information about the document. -- GreenC 23:20, 11 March 2018 (UTC)[reply]

Like in this edit, there are two ways to link to the same file:

https://archive.org/download/CalleLindstrom/09VictorRecords.jpg

https://ia800309.us.archive.org/23/items/CalleLindstrom/09VictorRecords.jpg

The first is a permanent link. It never expires or dies. The second is a temporary URL and can go dead at any time. It is "machine specific" because the "ia800309" is the name of a computer in the Internet Archive cloud where the file is currently hosted, but it can stop working if that computer is retired, goes offline or is renamed. So there is a bot process the ensures these are changed to perma-links. If the concern is about going direct to the file versus the details page, you can change that by using the "/download" in the URL. For example convert:

https://ia600309.us.archive.org/23/items/CalleLindstrom/CalleLindstrm-visbok.pdf

https://archive.org/download/CalleLindstrom/CalleLindstrm-visbok.pdf

The second is a permanent link that never expire. For PDFs, the bot defaults to using "/details" but there are some cases where "/download" might be preferred but it has to be done manually. -- GreenC 23:31, 11 March 2018 (UTC)[reply]

Thank you for the speedy reply and clarification.

It appears that the reformatted "download" permalink works fine for JPEG and MP3 files. A problem, however, remains for PDF files. When the GreenC bot reformatted the Calle Lindström songbook PDF file link, the result was not a permalink PDF file, it was simply the Calle Lindström Internet Archive page. (See the example below.) To the best of my knowledge, this is also the case for the other permalink PDF files. Instead of an individual file, the result is the page on which the file appears.

Calle Lindström songbook

I think there may be a technical problem with the Green C bot, so that it doesn't properly reformat PDF file links. Any suggestions? Pfa (talk) 00:36, 12 March 2018 (UTC)[reply]

Usually PDF's are the main document the "/details" is a flip-book view plus other meta data. Example. In this case, CalleLindstrom, the work is multi-media with a combination of sound and PDF so the "/details" isn't right. But the bot has no way of knowing this. It would require manually adjustment to https://archive.org/download/CalleLindstrom/CalleLindstrm-visbok.pdf .. it might be able to check that https://archive.org/details/CalleLindstrom is part of the audio_music collection, and adjust to "/download" for PDFs in those cases. -- GreenC 00:50, 12 March 2018 (UTC)[reply]

Thanks again for the advice. I realize that having MP3, JPEG and PDF files on a mult-media page is bending the rules a little.

https://en.wikipedia.org/wiki/Lydia_Hedberg

The Lydia Hedberg Wikipedia article has 6 PDF file links. I converted all of them to "archive.org/download/" links. The other links are in still in the temporary format. The next time the Green C bot comes around, will it convert the temporary links to download format but leave the 6 PDF download links alone? If that's the case, I could merely manually convert all of my many PDF links one time and be finished. Pfa (talk) 01:31, 12 March 2018 (UTC)[reply]

It won't leave PDF alone if they are in machine-id format, but if they are in "/details" or "/download" it will bypass them. The question is if the bot is smart enough to know to convert PDF to "/download" for a multi-media work, I haven't tried programming it yet. If you were to manually convert them to "/download", it would be the safest. The MP3 and JPEG the bot will convert next time around (not sure when it will be). -- GreenC 01:45, 12 March 2018 (UTC)[reply]

OK. Thanks a lot. I'll manually convert all of my PDF links to the download format. And hope for the best the Green C bot shows up. Pfa (talk) 02:02, 12 March 2018 (UTC)[reply]

Bot removing valid archive URL?

In this edit the bot has removed what seems a valid archive URL. Ref before "High bred. Low bred. / Political Sketches". Collection online. British Museum. Archived from the original on 5 April 2017. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help) and after "High bred. Low bred. / Political Sketches". Collection online. British Museum.. Yes, the web page itself is alive (the reference was flagged as dead-url=no) but is pre-emptive archiving now deprecated or even disallowed? I hope not. Thincat (talk) 22:20, 19 August 2017 (UTC)[reply]

Not sure why that happened will look into it. The webcitation.org that use the query URL format are somewhat rare and there may be conditions that trigger a bug I haven't seen yet. -- GreenC 22:43, 19 August 2017 (UTC)[reply]

When querying the WebCite API it reports back as unavailable so it was deleted as a bad archive. Looks to be a problem in the WebCite API data. -- GreenC 00:42, 20 August 2017 (UTC)[reply]

In the same edit the bot changed a webcitation.org URL to an archive.org URL. Did the API also fail to locate that webcite archive? I use both systems quite a bit (not via API) and they are both buggy but on the whole, when it works, I subsequently find archive.org more reliable. That is why I had not originally drawn this second change to your attention. Is it part of any of the bots approvals to remove "failing" archives? I rather think it should not be because a site can be temporarily or partially out of action when tested but still may be usable later. Just like it is not good to remove "dead" URLs but OK to flag them. Thincat (talk) 05:49, 20 August 2017 (UTC)[reply]

The first webcite URL doesn't work [1]. It wouldn't make sense to flag a non-working link when an archive.org link is available to replace. The bot works on a staggered basis it checks and rechecks links before deciding they are dead over a 24hr period. I then do a final manual visual check of deleted links in the batch to make sure. For some reason that second link got pass all those checks, you found a rare mistake. It is approved for this work and went through a lot of testing and verification. -- GreenC 13:55, 20 August 2017 (UTC)[reply]

OK, thank you for looking into this and letting me know. Thincat (talk) 14:00, 20 August 2017 (UTC)[reply]

This happened in this [2] edit, too. I had preemptively archived links and they have now been removed. Umimmak (talk) 08:47, 27 January 2018 (UTC)[reply]

The only reason it removes links is it thinks the archive link doesn't work. That appears to be the case for both of the links. It also tries to find a different snapshot date, and a different archive at other services (webcite, archive.is) and as a last resort it deletes it entirely. -- GreenC 16:15, 27 January 2018 (UTC)[reply]

I was able to create new archives for these and updated the page. -- GreenC 16:34, 27 January 2018 (UTC)[reply]

Bot using earliest archived version?

In this edit, the bot seems to have used the earliest version stored in the archive. This will often be fine, but I've been told that the best practice for human editors is to use the archived version closest to the original access date. Presumably this could be automated, or (more simply?) the bot could use the latest archived version before the original access date.

In that particular case. I have changed the citation to use a later version in the archive, because my iPad (iOS8) could not retrieve the earliest one. – Fayenatic London 20:36, 20 August 2017 (UTC)[reply]

It's dependent on Wayback API meta data which is not always accurate. The algo tries to use closest to the access date, but if can't verify it defaults to the earliest assuming it exists. -- GreenC 20:43, 20 August 2017 (UTC)[reply]

Articles with permanently dead external links

Hi, is the bot intended to attempt a "re-fix" of archive links when the original citation already has a {{Dead link}} tag with a "fix-attempted=yes" parameter added? I have reverted this edit as not useful. The archive links added in both cases are just saved copies of 404 error messages -- Ham105 (talk) 02:09, 22 August 2017 (UTC)[reply]

Soft 404s require manual intervention, bots see them as 200 due to misconfiguration at the website. I've added a block to those links so they won't be checked again by Medic or Iabot. -- GreenC 02:50, 22 August 2017 (UTC)[reply]

Thanks, will use that method for these cases in future. -- Ham105 (talk) 08:29, 22 August 2017 (UTC)[reply]

Cosmetic Only Edits

Hi, I noticed that this edit was one where only |archive-url and |archive-date were removed. To me, this appears to be a WP:COSMETICBOT issue as the visual output of the page seems to be unchanged - although I am no expert with that template so there may be an output change. I wondered if you have any insights/previous discussions about this? Thanks! TheMagikCow (T) (C) 16:10, 31 August 2017 (UTC)[reply]

I've looked at this and it's very difficult for the bot to determine. It's a complex bot about 20k lines of code and the way its structured there's no easy solution. The other thing is old bots had bugs that added blank archive parameters and so Medic has to assume these archives need to be checked and re-added if needed. So it generates an archive URL and verifies if it's working. If not working, it removes the blank archive parameter so the next time it processes the page it doesn't have to constantly recheck the same links which is costly. -- GreenC 16:40, 31 August 2017 (UTC)[reply]

Thanks for the info GreenC. It seems that this is not the perfect circumstance, but far preferable to having the bot not working. TheMagikCow (T) (C) 18:42, 31 August 2017 (UTC)[reply]

Removing dead link information

@GreenC: This bot seems to have improperly removed dead link information. A couple examples (one of which you already reverted):

Kaldari (talk) 01:26, 5 November 2017 (UTC)[reply]

The Date_format_by_country was reverted for a different reason. The Hinton_Blewett is correct. |deadurl= is used in combination with |archiveurl= and |archivedate= but has no application as a standalone argument so the program removes strays. If the intention is to mark a link dead use {{dead link}} is the correct method. The only purpose of |deadurl= is to determine if the archive URL is displayed first or second in the rendered output (see CS1 docs). -- GreenC 04:11, 5 November 2017 (UTC)[reply]

@GreenC: Got it! Thanks for the clarification. Kaldari (talk) 17:57, 6 November 2017 (UTC)[reply]

Hi

When will the Bot be working again? regards,--BabbaQ (talk) 12:57, 10 November 2017 (UTC)[reply]

Hi, a couple more weeks. It runs periodically every 6-8 weeks processing articles edited by IAbot during that time frame. -- GreenC 14:23, 10 November 2017 (UTC)[reply]

Error interpreting url when subscription is required

Hello GreenC,

In this diff, the bot incorrectly interpreted a link to ancestry.com, which requires a subscription, as a dead link. The archived link also works, but it's not necessary, yet. I just changed the dead-url parameter to "no". Cheers! Grand'mere Eugene (talk) 03:29, 4 December 2017 (UTC)[reply]

Pipelines

Hi, I've never used this before, so I'm sorry if this is incorrect. Hooray if it is. I was lead here from the page about the incidents with the pipelines. I was trying to research about a major explosion on the Tennessee Valley authority line that runs through western New York. The company denies anything happened, but I remember the night sky being lit orange, flames shooting higher than the trees and it could be seen for miles away. My grandmother lived in a trailer just a few feet away from the buried pipeline and she was put up in a hotel for weeks until the "all clear" was given. Strange that nothing is listed, this incident was late 70's early 80's. I live on the Cattaraugus Indian reservation, in New York. — Preceding unsigned comment added by 2604:6000:1315:203B:0:1D71:E0C6:1548 (talk) 01:26, 8 December 2017 (UTC)[reply]

Sorry, can't help. I'm a (ro)bot. Try posting to Talk:List of pipeline accidents in the United States or Talk:List of pipeline accidents in the United States (1975–1999). -- GreenC 01:46, 8 December 2017 (UTC)[reply]

A barnstar for you!

	The Brilliant Idea Barnstar
	Dead link replacements from the Wayback Machine to archive.is website. Thanks @GreenC:, Iggy (Swan) 19:32, 24 January 2018 (UTC)[reply]

Confused

Hi, sorry to bother you, but could you explain what this edit message means - "Removed 1 archive link. Wayback Medic 2.1". This diff (https://en.wikipedia.org/w/index.php?title=Kim_Rhode&oldid=822037247) removed an archive link. Is there are reason? User: InternetArchiveBot was the one to put the link in. Red Fiona (talk) 20:03, 24 January 2018 (UTC)[reply]

The Internet Archive link is a soft-404. -- GreenC 21:31, 24 January 2018 (UTC)[reply]

Thank you Red Fiona (talk) 18:11, 25 January 2018 (UTC)[reply]

Matthew Gordon Banks

Dear GreenC, I am MOST grateful to you for seeking to re-format my Wikipedia page. It was wrecked in 2015 by someone who now calls himself Moist Towlett (a very unusual choice of user name) His last effort at Very Personally trying to do me down, shows changes just before Christmas changing remarks I DID make when very unwell due to PTSD. I would like his remarks to revert to the original. I sought to complain at the time that Tim Farron's money for his Leadership campaign which I originally tried to help, was primarily funded by two people then under investigation by the SFO. Charges of Bribery etc were only averted when Rolls Royce paid a fine of £600m. The wrecking of my page has meant that those who Google me re International Relations have no information at all of my work for the UK Defence Academy, anti-militants in the Mid East and south Asia or the bombing of my accommodation in Islamabad in Jan 2007 killing my two protection officers.

At least your efforts make a mess read a lot better and I would like you to revert Moist Towlett's continuous efforts to defame me by reverting his Christmas alteration.

Thank you for your consideration. MGB mwgbanks@aol.com2A00:23C1:6C1F:F301:3807:BEED:4B9E:39C0 (talk) 22:37, 8 March 2018 (UTC)[reply]

Is it possible for the bot to insert a nonexistent (or somehow malformed) Internet Archive link?

In this edit, two out of three NY Times links the bot tried to add appear to be something that should work but somehow they don't. (In specific, Diary of a Sex Addict and Holidaze appear to be nonexistent, but something is somehow getting appended to the URL somewhere so this might be some temporary quirk (or a permanent one the bot didn't account for). Felt weird enough I should mention it. (As a completely pointless aside, some other links in that edit technically no longer hold the relevant information (which was why I myself had tagged some of them as dead, because I hadn't had a chance to check for a better archive), but there's no way the bot could have known or accounted for that, so...) - Purplewowies (talk) 08:10, 13 April 2018 (UTC)[reply]

These two Wayback links work for me. It is possible the problem was temporary or with your browser. Maybe try clearing the browser cache or from a different browser. -- GreenC 13:19, 13 April 2018 (UTC)[reply]

How strange. It's not outright not working for me now--I see the correct page for a second or so (which didn't happen last night), then it redirects to something like https://web.archive.org/web/20121106080156/http://movies.nytimes.com/movie/411635/Holidaze-The-Christmas-That-Almost-Didn-t-Happen/cast?gwh=E27C1BD3C81E78A2442CA8720A7CF363 (which is not the URL the bot provided and technically isn't an extant archive). Hmm. Will have to look into why that's apparently happening but only for me. Weird. - Purplewowies (talk) 15:27, 13 April 2018 (UTC)[reply]

The "gwh=" portion of the URL is used to track the allotment of free article views each month at the Times. Are your monthly free article views over the limit? In which case there could be some strange interaction happening between the archive version and the live website due to embedded code in the page. -- GreenC 20:41, 13 April 2018 (UTC)[reply]

Archives for chapter/map URLs, not the title URL

In {{cite book}}, one can link a |chapter-url= that corresponds to |chapter=. Similarly, in {{cite map}}, one can link a |map-url= that corresponds to |map= (for a map within an atlas, etc.). These cases contrast with the option to link a |url= that corresponds to the |title=. I've noticed several case like this edit where the bot is adding a |url= that duplicates one of these other URL-holding parameters because the citation has |archive-url=, even though the |archive-url= corresponds to the component (chapter/map) in a larger source. Imzadi 1979 → 22:08, 16 April 2018 (UTC)[reply]

I fixed the chapter-url problem a few days ago, but if you see any problems since then please let me know. The map-url is new, will add a fix for it before the next run. -- GreenC 03:44, 17 April 2018 (UTC)[reply]

I'll keep an eye out. |map-url= has been around since 2015 though. Imzadi 1979 → 05:23, 17 April 2018 (UTC)[reply]

This feature (add a |url= if missing) was added to WaybackMedic recently while cleaning up Category:Pages with archiveurl citation errors and it appears to have unintended consequences. I'm trying to think of a universal way to bypass things like cite map without hard coding cite map, so the unknown unknowns are bypassed. Otherwise I might just disable it. -- GreenC 13:22, 17 April 2018 (UTC)[reply]

Error when adjusting "cite tweet" citations

Per an edit such as this, the bot is taking archived Tweets using the {{cite tweet}} template, removing the archive parameters altogether, and making it a url parameter only. It should not be doing this, as this template supports the use of the archive parameters and doesn't actually handle url as a parameter. - Favre1fan93 (talk) 02:02, 17 April 2018 (UTC)[reply]

Fixed. - GreenC 03:39, 17 April 2018 (UTC)[reply]

the bot erroneously removed "| deadurl = yes" from two references in article "Olkiluoto Nuclear Power Plant"

The bot erroneously removed "| deadurl = yes" lines from two references in article "Olkiluoto Nuclear Power Plant". In the third-latest revision, I had moved the definitions of all references to the References section, looked for archived substitutions for deadlinks where available, and added the line "| deadurl = yes" to the definitions where I could not find such archived content. In the article's second-latest revision, the bot removed two of these lines without explanation. However, the links remain dead, so the edits were made in error. I restored the lines in the latest revision, done just now. Should the bot's code perhaps be examined for bugginess, or is there another explanation for its edit? Teemu Leisti (talk) 07:49, 20 April 2018 (UTC)[reply]

Teemu Leisti, regarding this edit. The |deadurl= argument is only meant to be used in combination with |archiveurl= and |archivedate=, once an archive URL has been found. It's sole purpose is to change the citation display order, so the archive URL is displayed first. See the documentation at Template:Cite_web#URL. If the intention is to mark a dead URL, use {{dead link}}. I'll update the article to demonstrate. -- GreenC 14:31, 20 April 2018 (UTC)[reply]

OK, got it. Thanks. Teemu Leisti (talk) 17:45, 22 April 2018 (UTC)[reply]

The bot destroyed the formatting of numerous tables in article "Potomac River"

Please do not ever run this bot on article "Potomac River" again. -- Thank you, P999 (talk) 00:39, 25 April 2018 (UTC)[reply]

The bot did nothing wrong in this edit. Short-form archive URLs like this https://archive.is/UNtln are disallowed as they can hide malicious websites and code. See this RfC.-- GreenC 02:43, 25 April 2018 (UTC)[reply]

Bot removing content?

Hi there, I noticed that in this edit to Serial Peripheral Interface Bus, the bot failed to put a closing tag on a citation, and removed some good content. I replaced it, but wanted to let someone know, as well as make sure I was interpreting this correctly, as this is not really my area of expertise. If I'm wrong, feel free to revert my changes. Thanks! Jessicapierce (talk) 18:30, 16 June 2018 (UTC)[reply]

Because {{dead link|date=May 2018</ref> is missing a closing }} so the bot got confused -- GreenC 18:39, 16 June 2018 (UTC)[reply]

Bot reinstating bad archive link I removed at Sycamore Community School District

The heading title says it all, I guess. Probably the best way to understand what happened is to check the history. Graham87 04:32, 20 June 2018 (UTC)[reply]

This is being caused by the Wayback API incorrectly reporting the page as available. The reason is they have so many types of redirects (JS, auto-forwarding, headers, etc), their API isn't aware of them all or at least not responding correctly to it. I've added a trap to detect this particular redirect type. --GreenC 00:05, 21 June 2018 (UTC)[reply]

Bot appears to have broken authors in citations

Please take a look at this bot edit from 19 June 2018. Thanks. – Jonesey95 (talk) 05:24, 22 June 2018 (UTC)[reply]

Jonesey95, fixed. It's a new but rare bug as far as I can tell it only affected this one and this. -- GreenC 16:10, 22 June 2018 (UTC)[reply]

Thanks. Glad you were able to figure it out. – Jonesey95 (talk) 16:17, 22 June 2018 (UTC)[reply]

Wayback Medic 2.1 changing urls from this bot

Wayback Medic 2.1 is chaings urls from this bot or at least one url. In edit it changed the url to something completely different. The url is still live but now redirects to something else, and the archive url that was there is fine (ignore the comment on the archive that says "The requested page is currently unavailable or not found. Please note: Any snapshots taken between April 23rd 2010 and May 10th 2010 are currently unavailable. They will be made available again shortly." this is part of the archive). --Emir of Wikipedia (talk) 21:42, 2 August 2018 (UTC) (please mention me on reply; thanks!)[reply]

Caused by a bug in the WebCite API. API result. I've seen this before and didn't come up with a solution. The problem is the redir URL could be legit and there's no apparent way to tell a bogus redir URL from legit. However there is a pattern when the redir URL ends in an extension like .js .jpg .gif .css .. some kind of systems or media file .. so I added a trap to skip cases like that and will keep an eye on it. @Emir of Wikipedia: -- GreenC 23:20, 2 August 2018 (UTC)[reply]

Thanks for the clarification. If we could find a way of telling a bogus redirect from a legit one we could resolve this issue. Emir of Wikipedia (talk) 17:03, 3 August 2018 (UTC)[reply]

T. Rex (band)

Your bot is a plague and you make contributors waste a lot of time by correcting your mistakes. I've saved several important sources/urls present in the legacy section of the article, for posterity, on the Wayback Machine. But you erased the archiveurls [3]. This is a shame: when those websites won't work any more, how could readers have access to the sources if there aren't archiveulrs in the sources ? Can you explain your behaviour and do you often do this without informing the contributor ? Woovee (talk) 15:26, 8 August 2018 (UTC)[reply]

Why do you think this link is worth including on Wikipedia? [4] ? --- GreenC 15:32, 8 August 2018 (UTC)[reply]

BTW if you want to ensure this doesn't happen again, use multiple archive services. I recommend archive.is as they don't generally don't delete/move/change archives like wayback does. -- GreenC 15:38, 8 August 2018 (UTC)[reply]

Thanks for explaining me why you erased those dead links. I thought that once an url was saved on Wayback Machine, it would always be available. I was far from thinking that this site was not entirely reliable anymore, and do sometimes erase a page that had been properly saved under another url that worked. Woovee (talk) 16:15, 8 August 2018 (UTC)[reply]

"it would always be available" .. yes most believe the same, but unfortunately this is not true for any web archive provider. Reason #57 why I advocate for Wikipedia do its web archiving in-house instead of relying on third party services. -- GreenC 16:19, 8 August 2018 (UTC)[reply]

"Reason #57 why I advocate for Wikipedia do its web archiving in-house instead of relying on third party services". Is there a vote/discussion concerning this issue at the moment ? I share your view; a web archiving in-house for wikipedia would be a super plus. I would actively support this. Woovee (talk) 16:36, 8 August 2018 (UTC)[reply]

It's a complex issue that won't happen easily or soon, would require WMF to hire staff, legal issues, infrastructure etc.. but the more discussion the better. There was a recent discussion at Wikipedia:Village_pump_(idea_lab)#Running_our_own_archive_service?. -- GreenC 17:13, 8 August 2018 (UTC)[reply]

Wikisource

this edit caused a problem because it added a parameter to {{cite EB1911}} that does not exist and caused the article to be placed into Category:Wikipedia articles incorporating a citation from the 1911 Encyclopaedia Britannica with an unknown parameter. {{cite EB1911}} is a wrapper template around {{cite encyclopedia}} along with its attribution twin {{EB1911}} which allows editors to link easily to text on wikisource or to another site if the text is not yet available on Wikisource. I have a non-definitive list of similar templates in my notes subpage: User:PBS/Notes#List of PD Templates. The edit that you bot made would either have been ignored by most of the templates or ignored and flagged as it was in {{cite EB1911}}.

There are many other templates where trying to add the parameters |dead-url=yes |archive-date=... will be ignored see for example the 200+ Category:Attribution templates. There are others such as {{PastScape}} and {{Acad}}.

So how does the algorithm that the bot uses select the templates to alter? -- PBS (talk) 10:40, 18 August 2018 (UTC)[reply]

@PBS: I created this method for finding templates that use archiveurl/archivedate and the list of templates. However something is wrong if {{cite EB1911}} got edited, I'll look into it. -- GreenC 13:32, 18 August 2018 (UTC)[reply]

Thanks for supplying the link to the list. Two more that I recognise that should not be in you list are:

{{Cite Americana}}
{{Cite Catholic Encyclopedia}} -- you actually have a typo in the name {{Cite Catholic Encylopedia}} it is missing a "c" before the "l" in Encyclopedia.

The CE is I believe has a complete copy on Wikisouce. If you come across any archived pages for any of those three templates. Let me know on my talk page and I will deal with them. -- PBS (talk) 13:49, 18 August 2018 (UTC)[reply]

Those two are from the original IABot list which I assumed were accurate and never checked. I will re-open the Phab ticket so IABot won't edit those two. If you see any others please let me know and/or comment at the Phab. Thanks for your help. -- GreenC 14:27, 18 August 2018 (UTC)[reply]

re: {{EB1911}} I did find a bug in WaybackMedic and it's fixed. -- GreenC 15:00, 18 August 2018 (UTC)[reply]

Removing access date to citations to the New York Times

The bot is removing access dates to citations to articles from the New York Times where people have neglected to provide a URL. But articles from the New York Times are available online - so they have URLs. A better course of action for the bot would be to add the URLs.-- Toddy1 (talk) 21:21, 5 September 2018 (UTC)[reply]

I have the same concern. Links to NYT articles, and other {{cite news}} refs to old papers I might be able to find on, say, Newspapers.com, e.g. this edit, are cases of Category:Pages using citations with accessdate and no URL where I could make improvements by adding URLs. Removing the access-date makes it harder for me to see where to help, without changing anything for readers. Removing access-dates from e.g. books with ISBNs, and other refs with stable identifiers, is thoroughly helpful and I'm glad a proper solution is in place for that. I feel that we're losing information in other cases. › Mortee _talk 23:53, 5 September 2018 (UTC)[reply]

The |access-date= removals are to satisfy a 5-year-old RFC so that a number of errors can be unhidden including Category:Pages using citations with format and no URL, Category:Pages using web citations with no URL and Category:Pages using citations with accessdate and no URL - net effect there will be more notification than currently available. -- GreenC 00:49, 6 September 2018 (UTC)[reply]

The effect on pages that have been edited to remove access-dates will be less notification, won't it? In cases where URLs are discoverable, as with the NYT and some other news citations, that means it will be harder to see where to apply fixes. (Again, I agree with the idea behind this, but this seems like a tranche of changes where it's been counter-productive) › Mortee _talk 00:57, 6 September 2018 (UTC)[reply]

It's almost a separate issue. The number of citations to the NYT without a URL is a large set, while some percentage of those, a sub-set also have an access-date - but all of them regardless can have URLs added. If the goal is to find these cases then using stray access-dates is not good marker as it will miss many of them. Better to either create a tracking category, or a bot that finds and tags them similar to {{dead link}} but called {{needs url}} or something. -- GreenC 01:21, 6 September 2018 (UTC)[reply]

That's a fair point, though the stray access-dates had some value as a partial marker, suggesting the editor had a URL that can be rediscovered (the NYT is one example I can particularly help with, having access to the archives, but other papers are probably in the same boat). I don't have the technical skill to get these marked up more cleanly in bulk. Is that something you might be able to help with? › Mortee _talk 06:34, 6 September 2018 (UTC)[reply]

I would say post to Help talk:Citation Style 1 a request that citations containing a certain criteria either be tracked in a tracking category and/or they be marked with a red error message inline. Mention your project to fill-in URLs for citations that don't have them, and current lack of support in finding those cases. It's also possible there is another way to find them, someone might come up with. -- GreenC 13:16, 6 September 2018 (UTC)[reply]

Thank you, that's helpful advice. › Mortee _talk 17:07, 6 September 2018 (UTC)[reply]

Bot removed access date updated two weeks ago

Hi GreenC: Your bot just removed an access date from a citation at Thomas F. McKinney. Does your bot just check for the "url" value in the citation before removing the access date? The citation templates also provide parameters for "doi" and "JSTOR", which both accept and output urls. The link in question used both of these parameters, providing two urls within the citation. So this was a false negative for checking urls. I see many instances of editors using the JSTOR and DOI parameters in place of the URL parameter, so I would expect more false negatives. Cheers, Oldsanfelipe (talk) 22:27, 5 September 2018 (UTC)[reply]

Removal of access dates--why?

I am no expert of Wikipedia style, but is there a reason for the bot removing access dates from cited sources? The template for citing web, news, and other sources includes a place for the access date. Does it matter if the date actually stays in or is this a new Wiki style thing that we need to adjust to?

Depending on the style guide I use, in the past I may have been asked to provide an access date for articles I've written, especially for online sources, because those sources may change or because the URL is not permanent. I'm happy to follow whatever rule I'm supposed to, but this seems odd, arbitrary, and likely to cause some confusion.

Thanks for any insights you can provide.

Bayonett (talk) 22:36, 5 September 2018 (UTC)[reply]

I'm curious about the same thing. I'm not complaining, I'm just looking for guidance moving forward. Andrew³²⁷ 23:41, 5 September 2018 (UTC)[reply]

Same with me. I'm under the impression that Cite-web citations are encouraged to include access dates, because websites change, whereas citations to archival sources such as books do not need it. --Tryptofish (talk) 18:08, 6 September 2018 (UTC)[reply]

I just found the answer. The bot is only removing it from cites that do not contain a URL, which makes good sense. Please see: Wikipedia:Village pump (technical)#accessdate v. access-date. --Tryptofish (talk) 20:31, 6 September 2018 (UTC)[reply]

Also User:GreenC bot/Job 5. --Tryptofish (talk) 20:36, 6 September 2018 (UTC)[reply]

"Archived copy" FAQ

IABot has been using |title=Archived copy for the past 3 years when it can't determine the underlying title.
Due to a bug, the bot has recently been using |title={title} in error.
Determining the actual page title is non-trivial requiring a specialized title-bot due to the many edge cases. Such title bots have run in the past.
WaybackMedic is fixing the {title} bug by replacing with |title=Archived copy as IABot would have done originally anyway.

-- GreenC 18:09, 17 September 2018 (UTC)[reply]

Thanks! (t) Josve05a (c) 21:45, 29 September 2018 (UTC)[reply]

Incorrect archived item

In this edit, GreenC bot pulled a ref from two years prior to the accessdate that has a slightly different title. Looking at the Wayback Machine's index of snapshots, the "161" page had one just days before the accessdate, whereas the bot appears to have picked possibly the oldest one available. I don't know anything about the inner workings of the bot, but seems like some heuristic or timestamp-check failed. DMacks (talk) 04:07, 30 September 2018 (UTC)[reply]

Wayback has 10s of billions of pages with minefields of bad data (soft and hard 404s, redirects etc), it's not so easy as picking a date. So IABot and WaybackMedic use the Wayack API as a guide to find working pages, but the API sometimes can't figure it out the maps are incomplete, so in this case the API reported no page available thus the {{dead link}} left by IABot. In this case WaybackMedic bypasses the API by looking at the earliest snapshot which has the highest likelihood of being available, and if it can determine a working page it leaves that which is better than a {{dead link}} even though it's not a perfect match for accessdate, in the vast majority of cases it's sufficient and for those not it depends on humans to fine-tune. -- GreenC 11:03, 30 September 2018 (UTC)[reply]

Thanks for the detailed info! Was indeed not difficult to fix manually, just confusing at first when I actually tried to WP:V a piece of data from the ref that actually was only in some of the snapshots. DMacks (talk) 04:54, 2 October 2018 (UTC)[reply]

Bot removing invalid URLs

This bot removed invalid URLs in this edit. They were actually DOIs, which a capable editor would recognize. I assert that:

If GreenC bot is going to remove parameters besides access-date, it should use a different edit summary than "Remove 10 stray access-date".
GreenC bot should comment out the parameter rather than removing it, so it's easier for a human to figure out the error and fix it later.

Daask (talk) 14:46, 19 October 2018 (UTC)[reply]

I'm actually not sure why it removed the |url= I'm thinking it detected bogus data, ignored it, thought the url field was empty and deleted it. Or maybe it deleted when it doesn't contain a valid URL. This is the first time I've seen a DOI in the URL field, someone made a typo using |url= instead of |doi=. But bot job #5 is done running and will never run again. -- GreenC 15:36, 19 October 2018 (UTC)[reply]

Wrong information

The character Nathan Zuckerberg was not in “American Pastoral”. Clbleu (talk) 17:44, 17 November 2018 (UTC)[reply]

Amnesty FAQ

Wayback Medic is doing a 1-time task of converting dead amnesty.org URLs to a live version. In the process it is removing archives to the old URLs as they are no longer needed and outdated with the new URL. It is also doing a Wayback "Save Page Now" of the underlying PDF link that the new URLs point to. -- GreenC 21:09, 19 November 2018 (UTC)[reply]

Incorrect edit

@GreenC

See here: https://en.wikipedia.org/w/index.php?title=Bat_Pussy&diff=869330069

Edit summary: "Removed 2 archive links. Wayback Medic 2.1"

No archive links were removed, let alone two;
The source was tagged with {{dead link}}, but it's not dead;
Even if it was, then why didn't the bot put a wayback link in?

Cheers, Manifestation (talk) 18:10, 12 December 2018 (UTC)[reply]

The site is not dead, but it reports an error. From the header:

HTTP/1.1 409 Conflict
Server: openresty
Date: Wed, 12 Dec 2018 20:22:14 GMT

Don't know what to make of a 409, but the bot has to assume anything in 4xx space means a non-operational site. It's not temporary as the same thing happened when it ran on November 17 (I checked the logs) and the bot does multiple checks over a 24hr period before it settles on a final diff. It didn't replace with an archive because none is available for cinapse.co. You did add an archive, but it is for a different URL the bot couldn't know about (deadline.com vs. cinapse.co) If that is right suggest updating |url= to be in sync with the |archiveurl=. The edit summary is an artifact of the back and forth changes it sometimes gets confused what it did and undid due to the complexity of features. -- GreenC 20:44, 12 December 2018 (UTC)[reply]

@GreenC: Hmm, that's weird. I've never seen a 409 error. Maybe the server that hosts cinapse.co is supposed to handle bots a certain way, but does so incorrectly, and then ends up with a 409 somehow? That would also explain why there are no working waybacked copies of the page. I don't think I get the last sentence of your explanation. But, computing remains a difficult thing. At least for me. I made a mistake myself when I tried to fix the reference, mixing up the deadline.com and cinapse.co urls. :-( This is now fixed. Thank your for taking the time to comment. Have a nice day! Manifestation (talk) 20:36, 14 December 2018 (UTC)[reply]

Deerbrook Mall (Illinois)

https://en.wikipedia.org/wiki/Deerbrook_Mall_(Illinois)

just being nosie here;

why did the bot do what it did, it added way back links that were not yet dead but hid them | and showed a link putting

the archive first where the original that was not dead yet, and because a news paper changed it

url archive site, it went to a pay per view, join preview site.

https://en.wikipedia.org/w/index.php?title=Deerbrook_Mall_(Illinois)&diff=877193736&oldid=855235084

204.62.118.241 (talk) 22:09, 12 January 2019 (UTC)[reply]

HighBeam was taken over by Questia and that is the new link, it is (and was) a subscription site. The Wayback link is because dbrchamber.com has a bot blocker so the bot thought the link was dead - the bot doesn't normally do this kind of operation so isn't always good at it, you can change the |deadurl=no and retain the wayback link which is still useful. The others are empty fields that are not needed, that's a feature I might remove from the bot at some point. --

GreenC 23:11, 12 January 2019 (UTC)[reply]

FYI

do you know this tool, wikidata-externalid-url/ * https://github.com/arthurpsmith/wikidata-tools/tree/master/wikidata-externalid-url

ArthurPSmith ajusted it to work on the dead wikidata url’s. * https://github.com/arthurpsmith/wikidata-tools/blob/master/wikidata-externalid-url/index.php#L40

but after he did that the wayback turned off some of the end ones.

Lockal

The easiest way is to ask @ArthurPSmith: to modify his tool to redirect ...

(i. e. web archive) for all ch links, this would at least give better user experience

Lockal

Well, the actual (with preferred rank) formatter is Property:P345 #P1630:

https://tools.wmflabs.org/wikidata-externalid-url/?p=345&url_prefix=https://www.imdb.com/&id=$1.

Then ArthurPSmith's tool builds the redirection URL. Latest archive copy would do the job.

ArthurPSmith

Hi - I will take a look at this, probably won't be able to fix until next week though.

ArthurPSmith

I updated the IMDB redirection to send 'ch' id's to archive.org, it seems to work (I tried the Harry Potter example .... )

this is another bot * https://en.wikipedia.org/wiki/User:InternetArchiveBot and this persom works for the wayback co.

IMDb CH.

original */http://www.imdb.com/character/ch0000574/

is now going 404, july 30 2018,* https://tools.wmflabs.org/wikidata-externalid-url/?p=345&url_prefix=https://www.imdb.com/&id=ch0000574

converts too * https://web.archive.org/web/20180730111630/https://www.imdb.com/character/ch0000574/

next back = nov. 26 2017, * https://web.archive.org/web/20171126131911/http://www.imdb.com/character/ch0000574/

204.62.118.241 (talk) 01:35, 13 January 2019 (UTC)[reply]

Thanks for adding a reflist to my Exit57 Talk discussion..

Thanks!

"(cur | prev) 20:10, January 13, 2019‎ GreenC bot (talk | contribs)‎ . . (2,745 bytes) +18‎ . . (Add

References

to #Deletion of Joe Forristal (Kids in the Hall) mention is rude, unnecessary!! Let us undo that. (via reftalk bot)) (undo)

(cur | prev) 10:38, January 1, 2019‎ Vid2vid (talk | contribs)‎ . . (2,727 bytes) +794‎ . . (→‎Deletion of Joe Forristal (Kids in the Hall) mention is rude, unnecessary!! Let us undo that.: new section) (undo) Tags: Mobile edit, Mobile web edit" Vid2vid (talk) 04:44, 14 January 2019 (UTC)[reply]

Reftalk Bot

Hi, Spotted this [5] where the bot has added the template but the reference is invalid. How should this be handled? RhinosF1 (talk) 22:00, 14 January 2019 (UTC)[reply]

Existed before the bot [6], beyond the scope. -- GreenC 22:17, 14 January 2019 (UTC)[reply]

Ignored cbignore; broke archive-url link; said to reformat three "archive links", but only did one

Special:Diff/880787914. 84.250.17.211 (talk) 14:07, 31 January 2019 (UTC)[reply]

Fixed/improved. -- GreenC 18:31, 31 January 2019 (UTC)[reply]

Job 8

Please pass on my thanks to your handler for adding the reflists to talk pages. This is a very useful task which solves a problem that I've been working around less elegantly for some time. Certes (talk) 23:00, 31 January 2019 (UTC)[reply]

Certes, glad you like it. It's traversing all 5.8m talk pages looking for candidates, could be a month. Along the way it comes across a certain problem it can't fix but is able to log it, they will require manual fix. Would you have an interest in helping with those? -- GreenC 03:23, 2 February 2019 (UTC)[reply]

I suspect that the problem cases can be semi-automated, so WP:AWB/TA may a good place to ask for help. I pick up requests from there and others may join in too. Certes (talk) 10:21, 2 February 2019 (UTC)[reply]

Good idea. Will wait till it's completed, so there is a full list. It's basically just converting sections that are incorrectly level-1 to level-2 (= vs. ==) -- GreenC 14:55, 2 February 2019 (UTC)[reply]

bad edit

See this edit.

I suspect that the bot was confused by the improper inclusion of the {{convert}} template.

—Trappist the monk (talk) 18:45, 21 February 2019 (UTC)[reply]

Yeah. Shouldn't happen again (for {{convert}}). -- GreenC 19:09, 21 February 2019 (UTC)[reply]

URL when have DOI

In a bunch of recent edits, archived copies of dead urls have been found and added to journal citations. But these cites also have doi, and a different bot task seems to be (and rightly IMO) removing urls when a doi is present. If "remove url when have doi" is a valid task, then "add archive-url when have url" should omit cases when a doi is present. DMacks (talk) 03:20, 24 February 2019 (UTC)[reply]

Hi DMacks, understand what you are saying. As a bot whose task is to resolve link rot, it doesn't make sense to ignore URLs it knows are dead but are otherwise unmarked (appearing live), as there is no guarantee if another bot will ever run on the page, or in a timely manner, or what action they may or may not take. At the very least they should be tagged with a {{dead url}}, but if we are doing that adding |archiveurl= is correct. I'd suggest deleting the |url= but it probably is a WP:CONTEXTBOT edit since there is no guarantee of a 1:1 content mirror between |url= and |doi=. The bot I think you mean is Citation bot but that is fully supervised so it doesn't have contextual edit problems, thus is able to remove the |url=. Probably the best thing is a 2-step process: unsupervised bots resolve the dead URL by adding a {{dead link}} or |archiveurl=, and supervised tools like Citation bot (or manual) can remove the |url= and other associated stuff. This would be easier if there was a guarantee that |url= and |doi= are the same basic content, then unsupervised bots can delete the |url= when they detect it is dead. -- GreenC 05:45, 24 February 2019 (UTC)[reply]

That makes sense. Thanks for the additional information about the respective bot processes! DMacks (talk) 10:36, 24 February 2019 (UTC)[reply]

Unify Square Needs Updating on Current Content

Unify Square is in need of current wikipedia page updates. If the topics are of similar interest to your work, please edit. https://en.wikipedia.org/wiki/Unify_Square https://www.unifysquare.com/ — Preceding unsigned comment added by Ryan Werner (talk • contribs) 23:16, 24 April 2019 (UTC)[reply]

Broken Headings

Last night the bot makes many errors by inserting "|".
==Heading==
Chapter text
was changed to
|==Heading==
|Chapter text
Have a look at the changes in Bejís, Triacastela, Alhama de Murcia, Carrión de los Céspedes and Vistabella del Maestrazgo. Regards --GünniX (talk) 17:58, 30 April 2019 (UTC)[reply]

Ah this is a bug, happened when the end of the Infobox and following text are on the same line eg.

}}Pizarra

Instead of

}}

Pizarra

Investigating how to find them. -- GreenC 18:12, 30 April 2019 (UTC)[reply]

Looks like 7 total, those 5 plus Pizarra and Villahermosa del Río. -- GreenC 19:17, 30 April 2019 (UTC)[reply]

Fixed (Example). -- GreenC 21:35, 30 April 2019 (UTC)[reply]

url wikilink conflict errors due to this bot

For example, this edit.

—Trappist the monk (talk) 15:00, 3 May 2019 (UTC)[reply]

This one is different but still broken.

—Trappist the monk (talk) 15:10, 3 May 2019 (UTC)[reply]

Ah crap. This was a task from hell. Good news it is finished, and only edited about 800 pages, but no doubt it caused some problems. Trying to find these will be interesting. -- GreenC 15:23, 3 May 2019 (UTC)[reply]

Here's another odd one

—Trappist the monk (talk) 15:27, 3 May 2019 (UTC)[reply]

@Trappist the monk:, if possible please leave the above two tracking categories in-place for the next 24-48 hours as I can search the articles for "art uk" and identify which ones need to be fixed; or notify me if you revert the bot again I will check your contribution history (I don't monitor bot revert notices since I am not logged into the account). -- GreenC 16:12, 3 May 2019 (UTC)[reply]

I was only playing in Category:CS1 errors: URL–wikilink conflict. I don't know what other category you mean. No problem staying out, I can find more than enough to do elsewhere.

—Trappist the monk (talk) 16:16, 3 May 2019 (UTC)[reply]

The other is Category:Pages with citations using unnamed parameters because of the |BBC]] causing a "Text "????" ignored" error. In your second diff above. -- GreenC 16:36, 3 May 2019 (UTC)[reply]

A crude sampling of what lies in Category:Pages with citations using unnamed parameters did not find any edits by the bot. In the other category, there were no others like the North Hylton edit.

—Trappist the monk (talk) 16:45, 3 May 2019 (UTC)[reply]

From_the_Lions_Mouth and William_R._Symonds. In the |website= and |work= fields respective. The "404 title" in North Hylton was not created by this bot it preexisted. -- GreenC 17:00, 3 May 2019 (UTC)[reply]

Adjustment needed

Hello GreenC. Please see this edit indicating that the bot isn't going to let me use it anymore. If I have done something wrong just let me know. Best regards. MarnetteD|Talk 23:17, 11 May 2019 (UTC)[reply]

Thanks, I left a question for you at Wikipedia:Bots/Requests_for_approval/GreenC_bot_16#GreenC_bot_16. -- GreenC 23:48, 11 May 2019 (UTC)[reply]

I replied there. Coincidentally, after logging off, I remembered that there was a trial period for the bot and that we must have reached the end of it :-) Cheers. MarnetteD|Talk 01:11, 12 May 2019 (UTC)[reply]

MarnetteD thanks for the continued support it definitely makes getting bot approval much easier when there is community support. -- GreenC 15:05, 12 May 2019 (UTC)[reply]