Jump to content

Wikipedia:Reference desk/Archives/Computing/2023 May 10

From Wikipedia, the free encyclopedia
Computing desk
< May 9 << Apr | May | Jun >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


May 10

[edit]

Is there a way to detect that a news website is a "News" website?

[edit]

Does the HTML source code of news websites include any special or distinctive HTML tag or HTML tag attribute that indicates something like "Hey there! This website is primarily about news and is part of the mass media!"?

Thanks. 2A10:8012:17:CDC6:D066:AC47:1958:9121 (talk) 20:16, 10 May 2023 (UTC)[reply]

They will typically have meta tags for description and keywords identifying themselves as news sources. For example, for The New York Times, their website www.nytimes.com has (slightly simplified), <meta name="description" content="Live news, investigations, opinion, photos and video by the journalists of The New York Times from more than 150 countries around the world. Subscribe for coverage of U.S. and international news, politics, business, technology, science, health, arts, sports and more."/>  and  <meta name="keywords" content="news, live updates, latest news, breaking news, local news, current events, top stories, livestream, live video, world news, us news"/>. For the BBC at www.bbc.com we find, more concisely, <meta name="description" content="Breaking news, sport, TV, radio and a whole lot more. The BBC informs, educates and entertains - wherever you are, whatever your age."/>  and  <meta name="keywords" content="BBC, bbc.co.uk, bbc.com, Search, British Broadcasting Corporation, BBC iPlayer, BBCi"/>. None of this is standardized, and nothing prevents Joe Shmoe from Podunk to set up a website advertizing itself as the go to place for in-depth reporting of the latest news from all over the world.  --Lambiam 12:01, 11 May 2023 (UTC)[reply]
Thanks. 2A10:8012:17:CDC6:79FB:4D40:EB68:7253 (talk) 00:11, 12 May 2023 (UTC)[reply]
Detecting whether a website is a "News" website based solely on the HTML source code can be challenging and not always reliable. While some news websites may include specific HTML tags or attributes indicating their nature, there is no standardized or universal tag that all news websites must use.
However, you can look for certain elements in the HTML source code that might suggest a website is focused on news. Here are a few common indicators:
Meta Tags: As you mentioned, news websites often include meta tags for description and keywords that identify themselves as news sources. These tags may contain keywords like "news," "breaking news," "current events," etc.
Structured Data Markup: Some news websites implement structured data markup, such as schema.org's NewsArticle markup, to provide structured information about their articles. This markup can include properties like headline, date published, author, and more.
RSS Feeds: Many news websites offer RSS feeds that allow users to subscribe to their content. Look for <link> tags with type="application/rss+xml" or type="application/atom+xml" attributes, which can indicate the presence of an RSS feed.
URL Structure: News websites often have URLs that reflect their news sections or categories. For example, a URL like "news.example.com" or "example.com/news" may suggest a news-oriented website.
Content Markup: News articles on reputable news websites often follow a specific content structure. Look for HTML tags commonly used in news articles, such as <h1>for headlines, <p> for paragraphs, <time> for publication dates, and <cite> for article sources.
It's important to note that these indicators are not foolproof and may vary from website to website. Additionally, some websites may not have clear indicators or may use generic tags that are not specific to news. Therefore, it's advisable to consider multiple factors, including the website's branding, content, and reputation, when determining if a website is a reliable news source. DSamuel088 (talk) 09:07, 17 May 2023 (UTC)[reply]