Jump to content

Wikipedia:Link rot/URL change requests/Archives/2023/April

From Wikipedia, the free encyclopedia


businessweek.com

Processed about 10k links almost all of which were either dead, soft-404 or better-off-dead ie. the new page works but is otherwise behind a paywall and replacement with an archive-url is an improvement. This site uses extensive bot detection. For a workaround see the awk scripts "t" and "t2" in the meta directory businessweek.00000-10000 -- GreenC 02:49, 3 April 2023 (UTC)

EBSCOhost Connection is back as EBSCO Essentials

Back in November 2020 we discussed here what to do with broken links to the discontinued EBSCOhost Connection service. I just discovered that EBSCO has a new service called EBSCO Essentials that seems, at first glance, to provide access to the same information as the discontinued service.

Previous archived discussion of this topic:

But this is not immediately actionable as far as I know, because I haven't discovered a way to map the old URLs to the new ones. For example, taking the first item in Now Up-to-Date § Further reading, the old dead URL is http://connection.ebscohost.com/c/articles/9412062384 and I found the same item (note the same accession number in the URL) at https://essentials.ebsco.com/search/eds/details/now-s-new-calendar?an=9412062384 but I haven't found a way to get a URL with only the accession number and not the item title. This could merit further investigation. Notifying GreenC and Samwalton9 (WMF), who participated in previous conversations, in case they are interested. Biogeographist (talk) 13:51, 5 April 2023 (UTC)

I think the Wayback version of the old URL gives better results than the new URL
Mostly the "related articles" section, which I often used to find other articles at the site. It's not imperative to move the URLs to the new site so long as we have archives of the old site. There are about 377 links marked {{dead link}} that could be moved, it's within the realm of doing it manually, should anyway want to take it on. There are more than 1,100 with an archive URL. These numbers are what my bot did in 2020, there are more that pre-existed to the bot run not sure how many. -- GreenC 14:12, 5 April 2023 (UTC)
Thanks; I agree with that assessment. Biogeographist (talk) 14:44, 5 April 2023 (UTC)

Please add archive links to all links of the form "https://www.australianoftheyear.org.au/recipients/", except the root site, which is still a live link. The content is very stable but the URL's are not (I fixed many of the manually a year ago and they're already broken again ... I probably should've let the archive bot naturally do its thing, but I got some interesting article fixes out of the exercise). The new URLS cannot be automatically replaced (e.g. https://www.australianoftheyear.org.au/recipients/barbara-holborow/813/ is now https://cms.australianoftheyear.org.au/recipients/barbara-holborow-oam, depending on any honours the person received ... and I really don't feel like fixing these links manually every twelve months or so. Graham87 03:55, 31 March 2023 (UTC)

I'll work on this, when finished current project. -- GreenC 19:17, 2 April 2023 (UTC)
Graham this is complete on Enwiki. Also updated the IABot database so it will propagate to 150+ other wikis. -- GreenC 03:20, 6 April 2023 (UTC)

A generalized bot of KolbertBot or Bender The Bot

I would like to have a bot that does the following: 1. "http://www." or "http://" turn into "https://www." or "https://" for all URLs in every pages. 2. "https://" is the most favourable URL outcome, second would be "https://www." or "http://", so that "http://www." at last. 161.81.115.230 (talk) 11:41, 6 April 2023 (UTC)

Would this break some websites? I'm not sure "all websites work with https" is a correct assumption. –Novem Linguae (talk) 12:21, 6 April 2023 (UTC)
For bot requests see WP:BOTREQ. However Novem Linguae is correct, http and https are not (always) interchangeable some sites support https and some do not. They are technically two completely different URLs that can go to different places. Also the "www." is an assumption that doesn't always hold true. -- GreenC 13:10, 6 April 2023 (UTC)
And actually see WP:BOTREQ#A generalized bot of KolbertBot or Bender The Bot where the IP was already told this was not possible. Izno (talk) 17:24, 6 April 2023 (UTC)
Both true that secured http isn't universal, and also that not everything uses www. Not suitable for bot usage. Lee Vilenski (talkcontribs) 21:02, 6 April 2023 (UTC)

Newspapers.com legacy image viewer

Last year, Newspapers.com changed image viewers and for a time had URLs like https://www.newspapers.com/image/legacy/210351002/ (the end is a number of up to 9 digits). The /legacy must now be removed; these links do not resolve. They should now be, e.g., https://www.newspapers.com/image/210351002/. Note that while it's good practice to clip these articles for non-subscribers, this should at least be resolved. Sammi Brie (she/her • tc) 23:47, 2 April 2023 (UTC)

User:Sammi Brie it is complete on Enwiki, edited about 400 pages. -- GreenC 19:58, 7 April 2023 (UTC)

Cinesouth

This site is long dead, but many articles using it haven't tagged it as dead or in need of archive links. Kailash29792 (talk) 06:16, 5 April 2023 (UTC)

User:Kailash29792, completed on Enwiki and IABot updated to propagate elsewhere. It is a domain squatter with soft-404s. There are probably many domains like this, in case you see any others, thanks for the report. -- GreenC 22:27, 7 April 2023 (UTC)

about.com usurped and wiki blacklisted

Reported. User:Billinghurst, I can usurped domains with WaybackMedic according to process at WP:USURPURL but only on Enwiki. If the domain is blacklisted at the wiki level I probably can't because the blacklist will prevent the bot from editing the page. In which case the blacklist will need to be lifted for the bot to run. -- GreenC 22:07, 11 January 2023 (UTC)

@GreenC: suspended the global blacklist to allow for cleanup m:special:permalink/24351246, it will take ~15+ minutes to flow through. Please ping me to let me know when to reimpose the blacklist on this domain. Thanks for the work in this area. — billinghurst sDrewth 22:16, 11 January 2023 (UTC)
User:Billinghurst, thanks. It will probably take a couple days to work through. -- GreenC 02:37, 12 January 2023 (UTC)
User:Billinghurst, about.com is composed of many sub-domains. User:Harej reports there are about 1,000 subdomains. There are only 22 pages on Enwiki that are pure www.about.com (or about.com) URLs, which are the one's hijacked. The remaining 10k pages or so have sub-domains, and they are not hijacked. Examples: inventors.about.com/library/weekly/aa050898.htm, rock.about.com/od/rockmusic101/a/subgenres.htm, boxing.about.com/b/a/205307.htm, golondon.about.com/od/thingstodoinlondon/fr/TheVault.htm, randb.about.com/od/g1/p/CalvinRichardso.htm, psychology.about.com/od/jindex/g/jameslange.htm, dancemusic.about.com/od/artistshomepages/a/RobynInterview_2.htm, dancemusic.about.com/od/reviews/fr/RobynRobyn.htm - they redirect to new domains. One is a 404, one or two are soft-404s, the rest are good content. Any thoughts what to do? -- GreenC 17:06, 12 January 2023 (UTC)
We just marked the entire domain as permadead. It's odd that the base domain is hijacked, while the subdomains are not. Something funny is happening here. I'm starting to believe that about.com is actually compromised, not usurped. This requires more investigation. —CYBERPOWER (Around) 17:17, 12 January 2023 (UTC)
Dotdash Meredith indicates that 'about.com' has been dead-site-walking since 2017, instead should be dotdashmeredith.com. But I have no idea what content moved over or about sub-properties. DMacks (talk) 17:24, 12 January 2023 (UTC)
Actually I don't think there is any problem.. it appears about.com was purchased by Dotdash Meredith. See the bottom of inventors.about.com/library/weekly/aa050898.htm which redirects to thoughco and it says "ThoughtCo is part of the Dotdash Meredith publishing family." It all looks legit. -- GreenC 17:32, 12 January 2023 (UTC)
@GreenC:We don't do redirect domains, and all these urls have been usurped. The content at the old urls is not the content that it was originally, and we don't know what will ever be at the seat of these redirects, so our replacing them back to the archived is appropriate. People can link directly to the (new) domain(s) of interest if they consider the target relevant, those base domains are unimpacted by this blacklisting. We are just removing the capacity of redirects, and firming up the urls originally utilised (as shoddy or as good as they were at the time). The recent examples of additions are showing clear sign of spammy additions and needing firmer control rather than hiding under a base redirecting url. — billinghurst sDrewth 21:50, 12 January 2023 (UTC)
OK I understand what your saying about using the original URL (ie. archive of it) for verification purposes. i don't see malicious behavior by Dotdash Meredith though? Such as spam. If I had some examples of that, it will help to document why this is being done. It will touch 10k articles and in most cases the content looks legit so I think people might say something about archiving and usurping a legit-looking URL. In some cases it will even mark the URL dead, when no archive is available, even though a working redirted URL is there. -- GreenC 22:53, 12 January 2023 (UTC)
All good if you don't want to change the links to archival links. There is zero need to add for new additions with an about.com redirecting url, our real users can add the actual url. So it will just be the spambots that we are seeing. I will reactivate this on the m:spamblacklist. — billinghurst sDrewth 10:27, 13 January 2023 (UTC)
Interesting test case, can the whitelist override the blacklist? (Yes, I know it can.) What if we whitelisted archive.org. Then adding archive URLs shouldn't be in issue in any case. We should try that. —CYBERPOWER (Chat) 20:09, 13 January 2023 (UTC)
the problem is that then archive.org could be used by anybody to circumvent the whole blacklist. -- seth (talk) 23:03, 16 January 2023 (UTC)
but: partial whitelisting of archive.org works, e.g. via
web\.archive\.org/web/[0-9]+/https?://(?:[a-z0-9]+\.|)about\.com
-- seth (talk) 23:26, 16 January 2023 (UTC)

Royal Thai Government Gazette

Discussion: Wikipedia_talk:WikiProject_Thailand#Royal_Thai_Government_Gazette_(ราชกิจจานุเบกษา)_document_URL_scheme_change .. links exist in about 1,500 articles. Not all links are dead but the ones that are, are soft-404. -- GreenC 20:00, 22 April 2023 (UTC)

Done. -- GreenC 03:42, 25 April 2023 (UTC)

Buzzfeednews.com Archiving

Due to the inevitable link rot that will likely happen as a result of Buzzfeed News shutting down, I am requesting that the InternetArchiveBot archive that domain. If need be, you can see a slightly more detailed version of my request on WP:BOTREQ here. (Thanks @Izno for pointing me in the right direction.) That Coptic Guyping me! (talk) (contribs) 19:15, 20 April 2023 (UTC)

According to the Variety article: "there are ongoing discussions about the future of BuzzFeedNews.com, but all of BuzzFeed News work will be preserved and available within the BuzzFeed network" ie. the links will not necessarily be dead. They might remain in place without change, or be moved to a different URL. If the former nothing needs to be done. If the later I can move to the new URL when it is clear what the new URL structure is. -- GreenC 19:29, 20 April 2023 (UTC)
How much bother is it to have a bot archive them all? Because it might be better to be safe than sorry even if they're supposedly going to remain online. -sche (talk) 17:26, 23 April 2023 (UTC)
You mean into the Wayback Machine? That should already be done by the nomo404 process and previous runs of iabot. Are you seeing links not archived? -- GreenC 19:12, 23 April 2023 (UTC)
@GreenC: I believe @-sche meant to ask whether running the requested job as is would be too much of an inconvenience. I do agree with him, in that I think there'd be no harm in archiving those BuzzFeed news links now (especially in the likely event that neither I, nor possibly other editors, will remember to archive/fix this URL if there will even be a new URL). Getting it done now would certainly save us the headache from possibly having to do it later, and to my understanding, it's an automated process anyhow. That Coptic Guyping me! (talk) (contribs) 16:37, 26 April 2023 (UTC)
I don't know what you are referring to. There are two steps:
  1. Finding all Buzzfeed links on Wikipedia and making sure they have Wayback snapshots available on the Wayback Machine
  2. Finding all Buzzfeed links on Wikipedia and adding the Wayback links into Wikipedia.
These are separate process and procedures. There is no reason to do #2 because the links are going to move they won't be dead they will be replaced with new live links. As for #1 the links have already been archived unless you can show examples where they were not. -- GreenC 16:56, 26 April 2023 (UTC)
Ah, I didn't realize all the links we currently cite were archived (but spot-checking now, they do indeed appear to be); thanks! I suppose the other issue is whether we could archive news articles from Buzzfeed which we don't cite yet (but could), i.e. archive every news article they posted [everything on that domain] whether it's cited on Wikipedia or not, but this is a much more difficult and less important issue. -sche (talk) 18:50, 28 April 2023 (UTC)
It's likely most of them have been since Wayback will archive entire news sites and this one probably is on the list due to its high profile. It can miss some if a link is very old but Buzz is a relatively recent site. Between Wayback and archives made locally at BF, we should be fine. -- GreenC 21:03, 28 April 2023 (UTC)

Haitimega

Not entirely sure this is the right place to report this (if somewhere else is more appropriate, let me know), but as reported here, haitimega.com — which used to contain information about Haiti and which eight articles therefore cite — seems to have rotted / been usurped by a different site. (So links need to be replaced with links to archived versions of the pages, when these exist.) -sche (talk) 18:58, 28 April 2023 (UTC)

This is WP:JUDI passive spam, Wikipedia is inundated by it (not only enwiki). Ideally the citation would not be deleted, rather converted to usurped with an archive URL added. I added it to the list, recommend leaving the cites as-is until the next batch run. -- GreenC 21:11, 28 April 2023 (UTC)