Jump to content

Wikipedia:Link rot/URL change requests/Archives/2023/November

From Wikipedia, the free encyclopedia


MetaCritic repair

Metacritic recently revamped their webpages. In particular for video games, they now have a central landing page for games that is platform agnostic, and displaying the specific scores for a given platform is now a query parameter on the URL. See Special:Diff/1182032153 for an example. This likely impacts 50,000-100,000 URLs or more.-- ferret (talk) 18:41, 26 October 2023 (UTC)

User:Ferret: Looking at external links...
  • Compare new with old archive. The old archive is better. This is regressive content drift, drifting from good to bad.
  • This http://www.metacritic.com/film/titles/inthenameoftheking returns 404. However, there is a live page for the film at https://www.metacritic.com/movie/in-the-name-of-the-king-a-dungeon-siege-tale/ which has good content .. how to discover the redirect? WaybackMachine has it buried away in the 2010 snapshots: click on December 12 and follow the redirects. So it would in theory be possible to convert this page to the new URL by doing discovery for the old redirect in the WM. But do all URLs have a 2010 redirect? No idea. And is the live page always any good? The previous example suggests not. Difficult complications.
  • Maybe if I only did /game/<platform>/<title> and converted those to /game/<title>/critic-reviews/?platform=<platform> .. MC also has films, books, music, etc.. but don't know how to convert them, as the moment.
  • First pass analysis: MetaCritic is a very large site that has changed multiple times over the years. There probably hasn't been much maintenance done on Wikipedia. The content has sometimes drifted, sometimes regressive. Redirects that once existed no longer do, but might be found in the WaybackMachine. Most dead links to MetaCritic have a live page, but due to the scale and shifting moves over the years, manual maintenance finding them is difficult. Worst case solution is convert dead links to archive URLs, has been the default solution up to now. -- GreenC 21:41, 26 October 2023 (UTC)
Yes, I can only speak to the changes to the /game/ formatting. Whether films, books, music, etc are similarly affected by this is beyond my knowledge set. For the cases I've checked on games, a simply reshuffling of the URL repairs them. -- ferret (talk) 22:30, 26 October 2023 (UTC)

User:Ferret - They added redirects. Try it!

The primary difference now is the first link redirects to an all platform summary page, where before it went to a platform-specific summary page. Compare old version has a grey "PC" next to the game title. The new version doesn't have a platform-specific summary page. Most links are of this variety, to the summary page. Another issue is in the example Special:Diff/1182032153 given above, the link to the summary page was converted to a critic-reviews page. Maybe that is what was intended for those cites, but I can't determine via bot that all summary page links should be converted to critic-reviews pages. It's a good guess, of course, but maybe not all are the correct change. A change of that magnitude would require discussion with others, so I can point to the discussion in case anyone complains, if you still want to do that. I'll hold off doing anything until I hear back from you on how you want to proceed. Take your time, thanks. -- GreenC 23:31, 30 October 2023 (UTC)

I think we're in better shape with redirects in place. It's not perfect but it's workable and you can still get to the score that way. -- ferret (talk) 23:35, 30 October 2023 (UTC)
User:Ferret: Recall the inthenameoftheking example above. They created a redirect in 2010, but at some point it stopped working, probably when they made a later change to the destination URL. They didn't support multi-level redirect. The current redirects might last a while, but eventually they will likely dead-end, also. I can proactively change the URLs to the new destination URL, it will buy Wikipedia more time and make future updates less complicated. It's a big project, but not urgent, I can do it as time allows. What do you think? -- GreenC 15:33, 31 October 2023 (UTC)
Yeah go for it. You're more knowledgeable about the future issues. I just didn't want you to work if you felt it was good enough now. As far as I can tell, for complete parity, the URL should be changed from /game/<platform>/<title> to game/<title>/critic-reviews/?platform=<platform> -- ferret (talk) 16:04, 31 October 2023 (UTC)
Re: parity, the summary page is not the same as the critic-reviews page. The summary page has information from the details + user-reviews + critic-reviews. For example the summary includes a description, publisher and release date. This is probably often cited on Wikipedia, but the critic-reviews page doesn't have it. I think the safe thing is maintain linking to the summary page, unless there is a specific page provided .. thus the summary page /game/<platform>/<title> will go to /game/<title> .. because there is no platform-specific summary page on the new site, that was the major change. The only way to get platform is in the critic-reviews page. Thus there isn't complete parity for summary pages, but, it's still good IMO. -- GreenC 17:29, 31 October 2023 (UTC)
Metacritic for video games is only used to source the Critic Review score. We do not use it for release date, publisher, etc. This is because that data is sourced from GameFAQs, an unreliable USERG database. That's also why we don't source user reviews (unless covered by a reliable secondary) -- ferret (talk) 17:45, 31 October 2023 (UTC)
OK. In that case, I'll convert the summary pages to critic-reviews. -- GreenC 19:27, 31 October 2023 (UTC)
@GreenC Spotted an unexpected issue. "switch" needs converted to "nintendo-switch". Spotted at Special:Diff/1182880804. This is the only issue so far, I've validated numerous other links and platforms. -- ferret (talk) 23:41, 31 October 2023 (UTC)
I've patched it going forward, but, there are about 400 in the queue. I'll go back and repair those on wiki. A little more difficult than search-replace due to the archive URLs which still need "switch". If you see anything else let me know, thank you for spot checking. It's no problem repairing things. -- GreenC 00:30, 1 November 2023 (UTC)
@GreenC Found one more. "ios" to "ios-iphoneipad". Example at Special:Diff/1182883899. -- ferret (talk) 00:44, 1 November 2023 (UTC)
Thanks. "switch" and "ios" are done, about 300 pages each repaired. In total there are about 10,400 pages with a /game/ URL. The first 2,500 are done. -- GreenC 01:57, 1 November 2023 (UTC)
Bot work(ed):
  • Checked 10,409 pages, containing one or more /game/ URLs
  • Converted 20,018 URLs to the new format
  • Added 9 archive URLs where the /game/ URL doesn't work with the new format and an archive exists
  • Added 39 {{dead link}} where the /game/ URL doesn't work with the new format and no archive exists
  • Various other fixes: conversion to https, re-set |url-status= from dead to alive
-- GreenC 00:48, 2 November 2023 (UTC)

rpgfan.com

Some time ago, they moved their game reviews and soundtrack reviews to a new link structure, and their old game previews seem to be gone for good. Any link ending with .html doesn't work.

Game reviews: Old [1] New [2]

Soundtrack reviews: Old [3] New [4]

Previews: Old [5] QuietCicada - Talk 11:57, 4 November 2023 (UTC)

QuietCicada: These are difficult to impossible to determine. For example http://www.rpgfan.com/reviews/romancingsagaminstrel/index.html -> http://www.rpgfan.com/reviews/romancing-saga-minstrel/ but there is no way to determine where to put the "-". Some of them already have dashes and it's a simple matter of removing the index.html .. there are about 1000 links in this domain and it's possible with a lot of time and effort I could save a few hundred. I think the best solution is to let IABot add archive URLs when the link is dead. Also, if you want to create a map of old->new URLs on an individual basis, my bot can use that map to update the citations. -- GreenC 15:52, 6 November 2023 (UTC)

apnews.com

Per request at Wikipedia:Administrators'_noticeboard#Major_source_problem_with_Associated_Press. -- GreenC 21:49, 30 October 2023 (UTC)

I created a list of pages with broken links at User:Bri/AP fixup pages. ☆ Bri (talk) 21:57, 31 October 2023 (UTC)
Per the above link, I'll hold off a while longer to see if AP fixes itself. They acknowledged the problem. Thanks for your search recipe results. -- GreenC 04:09, 2 November 2023 (UTC)

Looking at [6] (May 15, 2000) from ...Baby One More Time (album) the link remains dead. -- GreenC 17:29, 13 November 2023 (UTC)

reports.iihf.hockey

The "reports.iihf.hockey" website is not responding. Apparently, all those references will need to be rewritten to "stats.iihf.com". Maiō T. (talk) 13:45, 5 November 2023 (UTC)

Maiō T.: the bot edited 1,706 pages. It modified 8,795 URLs. Example: Special:Diff/1090651163/1184693466. Plus other misc. It found a few dozen dead links in 2009 IIHF Inline Hockey World Championship / Special:Diff/1178262869/1184678508. Maybe the URLs have a syntax error? -- GreenC 01:33, 12 November 2023 (UTC)

Thank you very much GreenC!
As for those wrong URLs, the word "inline" is missing there. Correct URL looks like this: https://stats.iihf.com/Hydra/inline/137/IHM137A04_74_5_0.pdf.
Maiō T. (talk) 10:56, 12 November 2023 (UTC)
It's OK. This task took the better part of a day because my boilerplate code wasn't up to the task due to the way the URLs were used in the article, took a while to figure out, so I was able to make improvements to the boilerplate for generalized future use. The missing inline is also fixed: Special:Diff/1184690551/1184846336 -- GreenC 00:21, 13 November 2023 (UTC)

metrolyrics.com

first reported at meta:User_talk:InternetArchiveBot#metrolyrics.com

Domain is dead and has template exposure. Reported by User:Billinghurst. -- GreenC 17:35, 7 November 2023 (UTC)

User:Billinghurst: the bot found about 40 pages that needed updating, and I couldn't find anything in template namespace. -- GreenC 02:44, 12 November 2023 (UTC)
@GreenC: There are three templates showing at wikidata, though only one has broad usage (Template:MetroLyrics song (Q13256314), which was the intent to mention at metawiki. I will drop a note on all the pertinent talk pages where it will go unseen, <shrug> — billinghurst sDrewth 07:31, 12 November 2023 (UTC)
User:Billinghurst: It was deleted Wikipedia:Templates_for_discussion/Log/2021_November_20#Template:MetroLyrics_song on enwiki. Unfortunately, my bot can't fix templates on other wikis, and I am not aware of a bot that can, because each wiki requires applying for and getting bot permissions. I mean, maybe it could, if I masqueraded as IABot, one of a handful of bots with pre-set global bot perms. It would require bespoke code though, a time consuming project. I'll think about it. -- GreenC 00:10, 13 November 2023 (UTC)

biblioteca.sernageomin.cl

This domain, linked multiple times mainly in citations, seems to frequently break. Is it possible to do a mass archive addition to its uses, 'specially in citations? Jo-Jo Eumerus (talk) 15:48, 11 November 2023 (UTC)

OK. Will do. -- GreenC 16:13, 11 November 2023 (UTC)

Jo-Jo Eumerus, the bot ran on 161 pages containing *.sernageomin.cl with the following results:

  • 180 links were found to be dead and an archive URL was added
  • 17 links were dead but no archive URLs are available and they are now marked {{dead link}}
  • 9 citations changed |url-status=live to dead
  • 10 links still work
  • IABot database updated for 300+ wikis where the links might exist

-- GreenC 23:58, 12 November 2023 (UTC)

Thanks. I do suspect that some of these "dead links" can be replaced by other links, though - is there a list somewhere? Jo-Jo Eumerus (talk) 09:20, 13 November 2023 (UTC)
17 dead links

Jo-Jo Eumerus: 4 of these links like in Jorquera (caldera) might be false positives ie. the link doesn't exist. -- GreenC 15:19, 13 November 2023 (UTC)

Yeah, seems like for some of them a replacement with https://catalogobiblioteca.sernageomin.cl/Archivos/ would work. I'll work on it. Jo-Jo Eumerus (talk) 17:23, 13 November 2023 (UTC)

webrecorder.io

Convert webrecorder.io archive URLs, like this example Special:Diff/1184954276/1184954464. -- GreenC 17:25, 13 November 2023 (UTC)

Done. -- GreenC 18:27, 13 November 2023 (UTC)

bookcritics.org

Domain has many soft-404. -- GreenC 00:59, 14 November 2023 (UTC)

Done. Edited 190 pages and fixed around 220 citations, most soft404. Sample -- GreenC 05:16, 15 November 2023 (UTC)

top10cinema.com

Although this says it is reachable, this link redirects elsewhere. Looks like a case of usurpation. Wonder how many such links there are. Kailash29792 (talk) 04:39, 15 November 2023 (UTC)

137. I will usurp them, but it might take a while because it will be part of the next WP:JUDI batch, which will take a while to find 30 or 40 domains to fill the next batch. Unless there is an urgent request. Added. -- GreenC 05:22, 15 November 2023 (UTC)

washingtonindependent.com

See Wikipedia:Reliable sources/Noticeboard#Washington Independent.
We should probably get rid of any link to the live domain (which is garbage and may be considered for blacklisting) and only archive.org snapshots from before 2015 should be used. When there is no old snapshot, the link/reference should be removed entirely. Is this possible?
Snapshot from September 2014 where the latest headlines are from January 2012. Snapshots of the homepage strongly suggest the site was down from 2015 to 2019. In 2020 the domain expired, notice from godaddy saying "This domain name expired on 6/22/2020".Alexis Jazz (talk or ping me) 14:27, 15 November 2023 (UTC)

User:Alexis Jazz: I have a setup for this kind of thing done it before including deleting cites without an archive URL. Some of the articles look pretty legit like here. This article was published in 2009 but the live link says 2020. I think your right with this solution. Change the status to usurped, since the domain was taken over by an unknown party who made incorrect modifications. The only thing is if you blacklist I would not be able to help because the blacklist would block my bot from making changes. 122 pages. -- GreenC 15:45, 15 November 2023 (UTC)
GreenC, yes, they republished some of the original articles which makes it confusing. If you go over the archives you'll notice that while the original was written by Spencer Ackerman the author in 2021 was Ceri Sinclair. In the current version no author is named at all. These republished versions, even if the text is identical, shouldn't be trusted either as the author and date are unreliable and it's unlikely they have a license to publish those articles. So there should be an archived link and it should be old. Reliable newspapers don't write articles titled "The 5 best online casinos in the USA right now".
I'd only request blacklisting after all existing links have been usurped. Btw, bots have the sboverride right so even blacklisting shouldn't be an issue for that? (besides, the bot would be removing the URL so the edit should never be blocked?)Alexis Jazz (talk or ping me) 16:48, 15 November 2023 (UTC)
User:Alexis Jazz, Tracked at WP:JUDI where it is now in queue. It's not a JUDI case, but in effect the same thing (usurped domain) from the bot's perspective. How soon do you want this done? I normally run them in batches of 30 or 40 domains it's easier, but if you want to get it blacklisted I could push it through sooner, currently only 3 domains in the queue. I hope sboverride is working now. -- GreenC 04:52, 16 November 2023 (UTC)
GreenC, thanks! There's no hurry, it's not high volume.Alexis Jazz (talk or ping me) 05:26, 16 November 2023 (UTC)
sboverride should work in theory. I see it on the official list at Special:ListGroupRights. –Novem Linguae (talk) 05:28, 16 November 2023 (UTC)

chartattack.com

@GreenC, perhaps this one also fits JUDI? Chart Attack used to be a paper magazine which was probably reliable at least for simple statements. Sorry, wrong link, it was (until May 2018) generally reliable per Wikipedia:WikiProject Albums/Sources / WP:RSN. Now it's just garbage. 2023: https://www.chartattack.com/best-crypto-investment/.
Was it garbage in 2020? [7]: "not many people know how to play online pokies and earn money."
Was it garbage in 2019? Probably, but not quite as obvious.
In 2018 the site looked rather different and actually had a focus on music. I don't think a cutoff date is needed for this one. Better safe than sorry: cutoff date May 24, 2018. Possibly bad links seem limited in numbers, they can probably all be found within these 38 results. (which aren't even all bad and few enough to comb through by hand) Should be usurped though.Alexis Jazz (talk or ping me) 10:28, 18 November 2023 (UTC)

There are 1,300 pages. Yeah there was a major change in 2018 or 2019 to the site's focus. Not sure what to call this kind of content, Google search traps? Like they check for popular Google search's, then write a think piece about that topic to capture the search traffic, and monetize it with adds. The content could be generated semi-automated via AI so it's low cost.
The old site "About" page says (April 2018): "Chart Attack is a guide to indie and alternative music, based out of Toronto, Canada, online since 1996. We're dedicated to showcasing great music that pushes expectations of genre". The name fits. They have an editor in chief and freelance authors. No problem. The new site has nothing to do with this. It is a usurpation. It's a lot of pages to usurp but I think you are right. -- GreenC 23:20, 18 November 2023 (UTC)
Waybck shows the domain was abandoned around May 24, 2018. Another site andpop.com had it within the next month into 2019. Then the reseller worldclassnames got it, and sold it to the current owners in April 2019. -- GreenC 23:36, 18 November 2023 (UTC)
Oh Chart Attack has more info. -- GreenC 23:40, 18 November 2023 (UTC)
GreenC, ah, okay there's your cutoff date then, May 24 2018.
In some cases articles from the original site are reproduced here as well, e.g. [8] vs. [9]. Note the change in author name, just like Washington Independent. Reproducing the original content is probably just SEO.Alexis Jazz (talk or ping me) 00:00, 19 November 2023 (UTC)
Added to JUDI's queue. I might get to it sooner than later due to the number of pages. -- GreenC 00:47, 19 November 2023 (UTC)

nationalgeographic.com

Many soft-404s. -- GreenC 01:41, 18 November 2023 (UTC)

..also *.natgeotv.(com|org).* eg. www.natgeotv.com.au GreenC 16:54, 20 November 2023 (UTC)

Results for nationalgeographic.(com|org):

  • Articles checked: 9,495
  • Articles edited: 7,732
  • Add new archive URL: 8,213
  • Switch |url-status=live to dead: 1,165
  • IABot database updated for 300+ wikis

-- GreenC 05:12, 24 November 2023 (UTC)

For natgeotv: checked 266 articles, edited 190 articles, added 153 archive URLs, change 19 url-status -- GreenC 19:50, 25 November 2023 (UTC)