Talk:Languages used on the Internet

Please place new discussions at the bottom of the talk page.

This is the talk page for discussing improvements to the Languages used on the Internet article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Internet High‑importance

	Internet portal This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.InternetWikipedia:WikiProject InternetTemplate:WikiProject InternetInternet
High	This article has been rated as High-importance on the project's importance scale.

Linguistics High‑importance

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics
High	This article has been rated as High-importance on the project's importance scale.

Question about quality and reliability of sources

I agree that Internet World Stats does not look like reliable data. Also the first chart that appears in the entry, with the footnote [1] leading to Internet World Stats... the page does not even support that chart. Is the chart from a different source, that got misattributed somewhere along the way? — Preceding unsigned comment added by 149.166.223.182 (talk) 13:20, 15 September 2014 (UTC)[reply]

The data comes from a website called [Internet World Stats | http://www.internetworldstats.com]. This site is published by the [Miniwatts Marketing Group | http://www.miniwatts.com] in Colombia. The website looks very unfinished. It contains - literally - a lot of "Lorem ipsum". It looks like someone did a very quick job of getting up the website and did not fill in all the fields in the template.

Can someone please say something more about the source? I am interested in using the data, but I feel uncertain about it at this stage. —Preceding unsigned comment added by Dnordfors (talk • contribs) 19:49, 28 November 2008 (UTC)[reply]

Internet World Stats looks like a marketing company, reads like a marketing company, and feels like a marketing company. I think it's irresponsible to be using them as a source. What would better serve people, however, is a quick look at the Open Net Initiative. More specifically, the vastly more reputable ITU statistics which are reviewed and actually legitimate: [1] —Preceding unsigned comment added by 128.113.106.71 (talk) 17:23, 19 January 2010 (UTC)[reply]

I have the exact same feeling on this "Miniwatts Marketing Group". There is no way to contact them except by mail and no information about their way to extract the numbers they present... I will remove them and the source. If new clues about their reliability arise, then we could suggest a roll-back. G.Dupont (talk) 14:51, 30 July 2010 (UTC)[reply]

After some extra surf, it appears that even the others sources (one of them being now down) does not state clearly how they did compile the number. It seems to be that this is strange. How could we claim this on wikipedia without a clear validation of the data ? Counting the number of internet users in a country is in my sense a very complex problem that maybe government and/or internet providers (if they work all together) could solve. Compiling such numbers on the whole world should be an enormous work... or a very nice fake. What do you think ? Without answers in few days, I will suggest to delete this article. G.Dupont (talk) 15:22, 30 July 2010 (UTC)[reply]

Move?

When I saw the title, I thought it would be about a government sponsored plan to give everyone on the planet internet access, if of a low bandwhich. However, it's a list of what language people on the internet speak. Move this to Internet user statistics or Internet user atributes or something similar. HereToHelp (talk) 02:03, 25 October 2005 (UTC)[reply]

In response to the above comment, the page was moved from Global internet access to Global internet usage.

Weird number

According to this organization, in 2006 there are 28 million French-speaking users. However, French stat companies state that in 2005 there were 26 million French Internet users and it's growing fast. [2] Since there are a number of other French-speaking countries or subnational entities out there, it seems to me that the number cited in this article is underestimated. David.Monniaux 06:57, 28 April 2006 (UTC)[reply]

We should split North and South Korea. Majority of the Internet users are from South, but the population is combined in the table; South alone is less than 50M. —Preceding unsigned comment added by Dean2026 (talk • contribs) 19:39, 19 August 2009 (UTC)[reply]

Duplicate articles?

It seems that this article and Languages on the Internet may be duplicates. They should probably be merged, preferably Languages on the Internet into this article (Global internet usage). Also, shouldn't internet be capitalized? It is the network as a whole afterall. - Rudykog 13:37, 17 June 2006 (UTC)[reply]

Merge OK

I think Languages on the Internet article should stay where it is. It was due to that reason I got here. This article had what I was searching for so I think the title is pretty descriptive. But I have no ojections if it is merged with the other article Global internet usage. That second article title is a bit misleading since I think it should be cover more than just what languages spoken. More statistics over how many people at all in different countries uses Internet. Browsers and other system statistics.

Average number of users on a single day

I suppose that with "Total number of Internet users" one means the total number of people that use internet with a given frequency, or that have used it at least once (which one, by the way?). Does anybody know if there are sources on the average number of people who connect to internet on a single day? And does "Total number of Internet users" as used in this article mean the same as "Internet users" in alexa.com?

Languages table

I think it should be deleted or fully rewritten, because it doesn't make any sense or contradicts other wikipedia pages. e.g.

1) this table says that there are 874 Chinese speakers, wikipedia page on Chinese says that Mandarin Chinese has at least 850 speakers[3].

2) languages don't have GDP and even if it shows GDP of countries, China doesn't have GDP per capita of $7,200. DVoit 14:18, 25 September 2007 (UTC)[reply]

What are you talking about? The number is in million. 874 million or 850 million, who knows that exactly? And about GDP, I don't know how the page looked like in 2007, but it doesn't matter. However, one thing is clear. You are a sinophobe and this is racist! This is not exaggerated. You don't agree with 874 million, so you looked for another source (Wikipedia can't be a good source for Wikipedia) showing a smaller number. Then you are not happy with China's high GDP, so you argue with per capita. --2.245.197.242 (talk) 10:08, 6 May 2015 (UTC)[reply]

Web or Internet?

Is this article about web pages or total internet usage? In other words, does it include figures for what languages are used in e-mail mesages, IRC chat, instant messaging conversations etc, or is it just about web pages published and web pages accessed? I think it is the latter, and so should be renamed appropriately, asap. --Nigelj (talk) 19:42, 10 August 2008 (UTC)[reply]

Websites or users?

I wonder whether the 'estimated online population' adequately reflects the proportions of languages actually found on the internet. I would surmise that, although China has a substantial online population, there are far fewer websites written in Chinese than in German. Does anybody know of statistical data which could corroborate or refute this?--84.190.54.122 (talk) 18:34, 12 November 2008 (UTC)[reply]

Dutch

The total population of Dutch speakers is way to low and I think the amount of Dutch users on the internet is also to low. —Preceding unsigned comment added by 85.144.100.44 (talk) 11:45, 23 November 2008 (UTC)[reply]

23 m speakers of Dutch. see: http://nl.wikipedia.org/wiki/Nederlands

You are right, if you look at this page http://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users you see that there are almost 14 million internet users in the Netherlands and 7 million in Belgium, if we take 60% of the users in Belgium(the amount is probably higher because Flanders has a higher internet rate than Wallonia) I get 4,8million+13,8million=18,6million(exc. Suriname, Nether. Antilles and Aruba). And it should be 27million(first+second language speakers) like the other languages have it too. —Preceding unsigned comment added by MaxvJ (talk • contribs) 14:50, 20 October 2009 (UTC)[reply]

Stats outdated

The stats of Internet users are outdated, if someone wants to take the time to update them : http://www.internetworldstats.com/stats7.htm/stats7.htm —Preceding unsigned comment added by GRAND OUTCAST (talk • contribs) 07:19, 29 May 2009 (UTC)[reply]

As stated earlier, numbers from this sources are subject to question. It's rather obscure source which does not present well its methodology. I suggest to not use it neither update the article based on this. G.Dupont (talk) 21:08, 17 January 2011 (UTC)[reply]

Number of Pages

I would like to see some statistics on the number of web pages in different languages.

Also, I wonder how much these statistics account for non-native speakers of a language. German is a very popular second language in a number of countries, esp. in Central Europe and the major English-speaking countries. That would almost double its numbers.Bostoner (talk) 02:46, 13 July 2009 (UTC)[reply]

Languages used on the Internet

This article used terms like 'languages used on the Internet' without specifying what that means. The 'Internet' is just a network. If it means the content published in various languages, it should state so. But most likely it refers to the number of users that speak various languages, and I rephrased some of the article's statements accordingly, as the sources seem to indicate this is what was measured. Kbrose (talk) 20:28, 28 September 2009 (UTC)[reply]

The NiteCo Survey is here to stay!

hello,

I've witnessed that the Average Age of Internet Users survey conducted by NiteCo has been falsely removed after citing bogus and incoherent reasons.

for those of you who are unfamiliar with it , here is a summary it's a user particaption based survey that collects input from surfers who visit the link upon doing so , the average age is displayed followed by a question regarding their age.

some ppl have suggested that this survey is inaccurate by citing the following these irrelevant claims

1) the survey is not random

i guess that's the most absurd claim ever , obviously the fact that it's a public survey open to any surfer implicitly means that IT IS random by nature.

2) the survey is self selected

well that's true , but the point is that this totally irrelevant as there is no evidence that self selection is harmful in the sense that it may corrupt the results.

for instance , there is no data that may support the hypothesis that older ppl or younger may have a tendency to report their age more often or less than the other age group.

in a nutshell , self selection does not neccessarily lead to a selection bias at least not in this case. in other words the burden of proof lies on the critics ! Cowmadness (talk)

I'm afraid you are wrong on every count here. I refer to this reversion of yours. First, you need to look at things like Sampling (statistics) to see that there is far more to it than you seem to realise. Then look at the survey page in question[4] to see that it is just a webpage buried on a single, non-notable website that is only going to be seen by people visiting that site, or whatever links exist to it (like this one). Next, look at the heading, and build up you give it: 'Average age of Internet users' and 'the average Internet user is 28.3037 years old', then check Internet and WWW to see that this is a web site, not representative of the whole web, let alone the whole internet, and that there is a difference. Finally, have a look at WP:BURDEN where it says, "The burden of evidence lies with the editor who adds or restores material". The second reference you give, apart from the one that proves the survey web-form exists, is to a digg page that no longer mentions the site or the survey. What we need is a third-party WP:RS reliable source that says that this survey is accepted by serious academics (or other people of similar standing) as being a recognised statistical survey that is widely regarded as representative. In the meantime, what you have there looks mostly like WP:LINKSPAM and should be removed in toto without further warning. --Nigelj (talk) 17:13, 22 February 2010 (UTC)[reply]

Usage per capita

(Usage percentage : number of speakers) would make an illustrating extra statistic. (It's trivial to calculate, but including this in the table would make sorting possible.) --Trɔpʏliʊm • blah 16:38, 12 February 2010 (UTC)[reply]

Outdated statistics

The stats are out of date again. Can some update them? http://www.internetworldstats.com/stats7.htm

Ouyuecheng (talk) 09:44, 18 February 2010 (UTC)[reply]

Outdated figure

The study done by W3Techs mentioned in the article is from December 2011. There is already a newer version as of 2012, which states English is used by 54.9% of websites and not 56%. http://w3techs.com/technologies/overview/content_language/all Many Thanks, Zalunardo8 (talk) 15:08, 12 December 2012 (UTC)[reply]

Outdated information

Hello, I believe the last paragraph of the section 'Languages Used' should be updated with more current information. The research shown is dated of 2007. The same info about the percentage of content per language is shown below in a graph, so I think we should either remove the paragraph, or use the same numbers. Cheers, Zalunardo8 (talk) 16:13, 17 December 2012 (UTC)[reply]

Merge "Foreign language internet" into this article?

A merge template was added in April 2013 suggesting that the article Foreign language internet should be merged into this article, but I can find no discussion of the proposal anywhere. So, here is a place to hold that discussion. Would such a merge be a good idea? --Jeff Ogden (W163) (talk) 12:26, 2 July 2013 (UTC)[reply]

Yes, definitely. Please. --Atlasowa (talk) 14:35, 2 July 2013 (UTC)[reply]

On 13 August 2013 the article Foreign language internet was nominated for deletion. The result of the deletion discussion was to replace that article with a redirect to this article. That was done on 20 August 2013. The consensus was that there was very little or nothing of value to be merged from the Foreign language internet article. --Jeff Ogden (W163) (talk) 00:40, 21 August 2013 (UTC)[reply]

Web sites with most languages

I'm adding the text from the subsection that was deleted by and is the subject of Jeffro's comment below:

Most languages on One Web Site

Wikipedia has more languages than any other site on the internet. There are presently 285 languages which have at least one article.^[1] Jehovah's Witnesses official website follows close behind with articles in 274 languages. ^[2]

--Jeff Ogden (W163) (talk) 01:03, 21 August 2013 (UTC)[reply]

I have removed the subsection about websites with the 'most languages' for the following reasons:

Though it may be true, there is no indication that Wikipedia has "more languages than any other site". It only indicates that Wikipedia has articles in a lot of languages.
There is no indication that the JW site has the second highest number of languages.

Please do not restore unless there is a reliable source indicating that these are the highest.--Jeffro77 (talk) 02:12, 10 August 2013 (UTC)[reply]

Please see the article "Global Recordings Network".

—Wavelength (talk) 16:41, 10 August 2013 (UTC)[reply]

Pie charts vs. bar graphs?

I've copied this discussion over from my talk page on Wikipedia Commons since more folks with an interest in this are likely to see it here. --Jeff Ogden (W163) (talk) 18:26, 15 March 2014 (UTC)[reply]

Just wanted to let you know I've replace the pie charts on Languages used on the Internet with bar charts as part of a group effort to introduce more perceptually accurate charts. See Save the Pies for Dessert, among others. The first table in particular is ill-suited for a pie chart because some sites use multiple languages and so the percents sum to more than 100%. I also updated the data in the first table. I will update other occurrences of those charts if you have no objections. Daggerbox (talk) 02:29, 13 March 2014 (UTC)[reply]

I'm fine with switching from pie charts to bar graphs.

I'll note that both pie charts showed percentages, but with the new bar graphs, one shows percentages and one shows millions of users. Is there a reason for the switch? Shouldn't they both be based on the same thing? Or perhaps we need four graphs?

I've thought for sometime that it would be good if this article could be based on figures from the first of a month, since there is some history associated with the first of the month figures and there is no history to use for verification for the other days of the month. Of course, while I thought about this, I never did anything about it.

In the upper bar graph a percentage figure is given for English, but no similar figures are given for other languages. Why is that? Seems like we should give the percentage for all languages or none.

Captions and other labels in English are included as part of the graphs. It might be better to minimize the use of English in the graphs to facilitate the use of the graphs in other language versions of Wikipedia.

--Jeff Ogden, W163 (talk) 03:29, 13 March 2014 (UTC)[reply]

Good to see your comments, Jeff. I'll address each one.

I prefer to include actual numbers with real life units where possible, relying on the graphic elements to provide the feel for the relative values. The first data source only provides percentages, of course. The graphs do not have to agree, especially since they don't necessarily appear together, but I can see value both in using a percent scale and in having the charts agree. I'll update if you have a preference since you're closer to the subject matter. Maybe a count axis with percentage labels would work best if both are useful to the message.

Good idea on first-of-the-month data. I was a little concerned that since the data is updated the daily, the graph and table will never really keep up. Would 1-Jan be even better? Looks like it's present in the monthly historical trend page at W3Techs.

In general, I don't like to label every graph element. Somehow it feels like the graph is not doing its job if you have to repeat the table text in the graph. I labeled the English bar to highlight the strongest point from the bar chart, which is that English is used by over half of the sites, and because that bar is pretty far removed from the axes. The other bars are roughly labeled by the reference line.

I agree with your sentiment about localization, but I'm not sure what the remedy might be. Is there a way to support localization in SVG, for instance? Looks like it is by using the systemLanguage attribute -- is that what you meant? Or maybe you mean to leave out the axis titles and leave the language names in English. "Language" is certainly not adding anything.

Daggerbox (talk) 23:27, 13 March 2014 (UTC)[reply]

I would go with the percentage based chart for both. That is what the pie charts that are being replaced were doing. I think mixing percentages and counts in one graph is likely to be confusing.

Using January 1st data would be fine. Or July 1st.

I'd omit the percentage label. The fact that English dominates comes across OK without it.

I'd omit as many of the labels from the chart itself as you can. I understand that it isn't possible to omit the language names. Much or even all of the other stuff can be left to the caption that is added by the articles themselves.

--Jeff Ogden, W163 (talk) 02:28, 14 March 2014 (UTC).[reply]

OK, I'll make another pass this week-end. Daggerbox (talk) 02:20, 15 March 2014 (UTC)[reply]

Done, except I forgot to update the data. Not a bad idea to start a new topic for that, anyway. Daggerbox (talk) 21:42, 15 March 2014 (UTC)[reply]

Frequency of data updates

The source data used by the table and graph for "Content languages for websites" is updated daily? How often should this page's reflection of that data be updated? Lately it's been valid as of the day of the most recent update. In the discussion above (pie charts vs. bar graphs), W163 and I thought more regular date would be best, such as the most recent January 1 or July 1. — Preceding unsigned comment added by Daggerbox (talk • contribs) 21:48, 15 March 2014 (UTC)[reply]

I would like to propose Top Language List

I would like to propose Top Language List by Number of 4k, or 8k TV station broadcasters, and 4k BluRay Movie Publications.

There Could be also list by number of bytes of Books, Total Time of Music Published on CD.

But I humbly think 4k/8k content will be much easier to measure.

Maybe we can ask YouTube about top languages, by time, but it can be hard, because of misslabeled content by default setting. — Preceding unsigned comment added by 62.21.42.220 (talk) 20:07, 7 March 2018 (UTC)[reply]

You are suggesting a new article separate from "Languages used on the Internet", right? --Jeff Ogden (W163) (talk) 20:45, 15 April 2018 (UTC)[reply]

Maybe it could be done, but focussing on 4k or 8k content sounds like it would be extremely biased towards languages prevalent in 'wealthy' countries. Provided that is made clear in the new article to be created, I wouldn't mind. It mustn't be presented as a surrogate measure of 'global language use'. Also, as mentioned by Jeff Ogden (W163), it has nothing to do with internet pages. —DIV (120.17.128.128 (talk) 03:06, 3 July 2018 (UTC))[reply]

Print books could maybe be done through the OCLC's WorldCat, although mislabelling (or lack of labels) can also occur there. —DIV (120.17.128.128 (talk) 04:12, 3 July 2018 (UTC))[reply]

Lead is awful

Been a long time since I looked at this article, and the lead is now a mess. I removed the first extremely awful paragraph as it asserted a 'controversy' before considering any information or providing a basic introduction to the context of the article. Whether the use of English on the Internet is 'controversial' belongs in the body of the article, not the first sentence. But the rest of the lead is also awful, and much of it is the type of content that would belong in the body. But it's not good content either. For example, the source for the statement "the main and most reliable source for persons connected to the Internet by country is the ITU" is the ITU. Surely no POV problem there?? The ITU may well be an authority on the matter, but the source should be used to support what the ITU actually states about languages used on the Internet, not statements about its own credibility.--Jeffro77 (talk) 00:45, 19 May 2018 (UTC)[reply]

I have modified the lead to an earlier version and removing references to FUNREDES/MAAYA, as the editor who added the material, Danielpimienta (talk · contribs), has a blatant conflict of interest. The web page for FUNREDES[5] states: "Two people have supported the project since its inception: Daniel Pimienta, President of FUNREDES and member of the Executive Board of MAAYA since 2009".--Jeffro77 (talk) 01:00, 19 May 2018 (UTC)[reply]

Whilst it may be suitable to cite the FUNREDES/MAAYA Observatory, it should not be done by someone closely associated with that organisation. An editor associated with that organisation also should not assert in the article that that organisation's methodology is superior. The article also should not unnecessarily use inflammatory terms like "dominates" or "provokes", which has also been removed.--Jeffro77 (talk) 01:39, 19 May 2018 (UTC)[reply]

By reverting my contribution, the editor Jeffro77 (talk · contribs) is, first, making the article come back to many years ago, erasing attention to new information on a subject of fast evolution and, second, leaving the article stands only on two biased sources (W3Techs and InternetWorldStats), leaving furthermore the implicit message they are totally reliable. The subject of measuring languages on the Internet is of a very limited number of experts and it is the duty of experts on that field to contribute and maintain up to date what is exposed in Wikipedia, as a meta source of information. It has obviously to be done with respect to all existing sources of information, even if, as it is the case here, they are not offering the same figures, and resisting any temptation of promotion of one's work or person. Expressing "An editor associated with that organization also should not assert in the article that that organisation's methodology is superior" is a serious accusation which is not documented. In no way my contribution has made a hierarchy of value of the different sources; it has just exposed the existing sources leaving the interested reader to make his/her opinion by reading the materials. It has also resisted to expose the products of the research in terms of languages classification and left the table of figures from the other sources as a form of respect. If this editor maintains that accusation he will need to sustain it by pointing exactly where in the article it was stated that the new methodology was superior. As a matter of fact, if it is a conflict of interest to be an expert in a specific field and bring light on new research in Wikipedia as a contributor, then Wikipedia will have to prevent all experts to contribute in their field of expertise, which is absurd, especially when the fields of expertise are very narrow and the number of experts is limited. I suggest to this editor to first read the sources he has removed and discussed it with others prior to make such drastic and unproductive decision unilaterally. The removed references (such as "An alternative approach to produce indicators of languages in the Internet", Daniel Pimienta, June 2017) brings along discussion about the different biases which occur when offering figures of languages on the Internet (with solid data) and this is what is really at stake and should be discussed and exposed in this article, with due respect to all sources. As for "the lead is awful ... as it asserted a 'controversy'" this is a value judgment and the controversy about the real place of English in the Web is historical and should be an explicit part of this article. The unilateral decision made by this editor is indeed, maybe unconsciously, taking position in that very controversy, in pro's of the misinformation of English representing more that 50% of the Web contents, by erasing the sources which challenge that figure! Finally stating that ITU is the only source reliable for figures on number of users on the Internet by country is something that professionals of that field know well and deserved to be mentioned. Danielpimienta (talk) 23:57, 5 July 2018 (UTC)[reply]

A person closely related to the subject of the article should not be making assertions that the other sources are biased. Citing yourself in your comments above also does nothing to alleviate concerns about potential for conflict of interest.--Jeffro77 (talk) 05:28, 7 July 2018 (UTC)[reply]

The editor Jeffro77 (talk · contribs) keeps talking about me instead of the content of the article (and without documenting his previous accusations). I invite him to focus on content and avoid ad-hominem attacks, as per the Wikipedia rules.

What is "awful" indeed, to retake the name of the thread, is the fact that an article classified high importance in the WikiProject Internet, and StartClass on the quality scale, has been stuck unchanged, for many years now, and doing the promotion of two business related sources, without expressing any critical perspective, all that in a context of fast evolutions.

Biases are inherent part of any production of indicators; the point is to identify them and weight their impact on the results and that was the duty of Wikipedia editors to perform that task. When the biases are important they deserve to be highlighted (as an example W3Techs offer the figure of less than 2% of web contents in Chinese while the percentage of Chinese speaking Internet users is close to 20%: this is absolutely not credible).

After a period of existence of many sources (1988-2007), the fact is that W3Techs and InternetWorldStats has been for the last years the unique existing sources for respectively the web contents per language and the languages of Internet users. There is no doubt they deserve credits for that and to be mentioned in this article; yet the exposition of their biases needs also to be made so to warn the readers in the use of those figures.

Since 2017, there is a third source of figures which cover both indicators. The source is not linked to any business interest and is from a party having a long history of producing such figures. Furthermore, this new source is totally transparent about its methodology and discuss the biases of all methods (including itself). This article needs to integrate that source and open the discussion about biases.

The mentioned editor seems to consider that controversies are inflammatory matters and have to be hidden. At the contrary, Wikipedia articles shall explicit the controversies exposing in a neutral fashion the different positions. The controversy about the real dominance of the English language in the Internet is as old as the Web and it is the duty of this article to reflect on it. As a matter of fact the only inflammatory matter so far has been the personal attacks of this editor to avoid focusing on the content.

I am not an expert in Wikipedia edition and I do not mean to invest time in becoming one. I rather see the article be updated and enhanced by the group of actual editors. So I invite all editors of this article to pay attention to the new source about languages in the Internet and update accordingly this article, taking the opportunity to opening the discussion about biases and on the controversy about English dominance (which by the way is extensively discussed in the past references of this article).

If this will be done I will commit to resist doing more editions and only will use the talk page if I feel a strong need to comment. If this not done after some acceptable delay I will probably become again an editor, revert back (or enhance) my contribution and be ready if it becomes necessary to use the dispute resolution possibilities offered by Wikipedia. Danielpimienta (talk) 15:14, 10 July 2018 (UTC)[reply]

The request to 'focus on content' is irrelevant as the issue is an obvious conflict of interest, which is inherently an editor issue.--Jeffro77 (talk) 15:59, 10 July 2018 (UTC)[reply]

If I were following up this talk I would be loosing my time with an editor who acts as if he is the sole owner of the article and denies all Wikipedia written editing rules (collaboration, focus on content, avoid personal attacks, etc.). I do hope other editors have been following that discussion and will take appropriate action on what really matters: contents. Meanwhile I will document myself and take advice on dispute resolution as it has become clear that no discussion on contents is possible in that thread. Danielpimienta (talk) 17:20, 11 July 2018 (UTC)[reply]

I noticed the formal mediation request. In case that is rejected because other venues have not been tried enough, I then recommend WP:COIN or WP:NPOVN. —Paleo Neonate – 20:18, 19 July 2018 (UTC)[reply]

Thanks to PaleoNeonate (talk · contribs) for that information which does make sense. I am clearly not an experimented wikipedian and it is not easy for newcomers to find the way towards dispute resolution or to capture the virtual community culture around so to try to respect it. I appreciate any help of that sort. Danielpimienta (talk) 20:54, 19 July 2018 (UTC)[reply]

I don't currently have time for an extended debate about this. There is obviously a conflict of interest involved. Other editors may like to add their thoughts.--Jeffro77 (talk) 23:24, 19 July 2018 (UTC)[reply]

This is literally what WP:COIN expresses: The COI guideline does not absolutely prohibit people with a connection to a subject from editing articles on that subject. Editors who have such a connection can still comply with the COI guideline by discussing proposed article changes first, or by making uncontroversial edits. COI allegations should not be used as a "trump card" in disputes over article content. (bolding made by the writer). I suggest to Jeffro77 (talk · contribs) to think about it, stop playing gatekeeper and accept the proposed mediation to avoid going further in dispute resolution. I also suggest the reading of Why is Wikipedia losing contributors - Thinking about remedies (Deletionists often appear to be more interested removing content rather than fixing it). This would be my last attempt to solve that dispute in this page or in the mediation procedure. Danielpimienta (talk) 13:15, 20 July 2018 (UTC)[reply]

Referring to 'rival' sources as 'biased' is not uncontroversial.--Jeffro77 (talk) 14:23, 20 July 2018 (UTC)[reply]

The requested mediation has been rejected. Jeffro77 (talk · contribs) was supposed to say agree or disagree. Instead: Maybe in a few weeks. I don't currently have time to get involved in a lengthy discussion. The editor's conflict of interest is evident and their claims about bias of other sources is entirely inappropriate. Any claims of bias should be attributed to the source (who is also the editor), not asserted in 'Wikipedia's voice'. which confirms again how much respect is shown towards Wikipedia procedures and rules (this was not for arguments or maybes but to say agree or disagree).

As for the reference to biases, everything the scrapped contribution expressed about biases was : FUNREDES/MAAYA observatory argues that using Alexa ranking for the 10 millions sample of websites on which W3Tech applies a language recognition algorithm provokes a huge under-estimation of many Asiatic languages, primarily Chinese and languages from India. In the referenced paper and associated presentations arguments are developed and warnings are made on the importance of biases in the measure of languages on the Internet. So it is absolutely false to state that the claim of bias was asserted in 'Wikipedia voice' and not attributed to the source. At this stage I will ask a neutral third party to state if I have a conflict of interest or have made controversial edits in relation with this article. If not, the rule stated by WP:COIN states that editors should refrain from further accusing that editor of having a conflict of interest and I hope this time this editor will respect the rule in good faith. Danielpimienta (talk) 19:09, 20 July 2018 (UTC)[reply]

I didn't 'disagree' with mediation, nor do I presently have time for it. I therefore provided a suitable response.--Jeffro77 (talk) 23:53, 20 July 2018 (UTC)[reply]

As the editor using Wikipedia to assert elements of your own work as significant, the point remains valid. Your edits also asserted a 'controversy' in the lead sentence, which is inherently controversial. The conflict of interest remains.--Jeffro77 (talk) 23:57, 20 July 2018 (UTC)[reply]

This editor was caught lying and obliged to acknowledge it implicitly. Do you think he will therefore retract his defaming statements and apologize? No. He rather pretends to equate now "using Wikipedia to assert elements of your own work as significant " with "assert in the article that that organisation's methodology is superior" (his first false and defaming statement) and "Any claims of bias should be attributed to the source (who is also the editor), not asserted in 'Wikipedia's voice'" (his second false and defaming statement).

Twisting the facts seems therefore to be a habit here and we have gained the most inept twisted statement of that sort to conclude with : "Your edits also asserted a 'controversy' in the lead sentence, which is inherently controversial. ".Indeed a spectacular and quite laughable relevance fallacy which a professor could use to explain the concept to students!

So far I have considered I was dealing with a person acting and expressing in good faith; I am sorry but it has become extremely difficult for me to keep considerating that.

Yet we have a point of agreement : "Other editors may like to add their thoughts". Please do!!!!

If that help, the subject of discussion can be read easily thanks to the wayback machine of archive.org at: : https://web.archive.org/web/20170926152702/https://en.wikipedia.org/wiki/Languages_used_on_the_Internet The contribution I made and which was entirely erased by this editor is and only is the full introduction from the first sentence until the table of contents. Danielpimienta (talk) 17:09, 21 July 2018 (UTC)[reply]

The claim that I was 'caught lying' is false. The claim of defamation is bizarre, since even I were incorrect, it would be odd that you do not believe your methodology to be superior. You seem to be trying to take advantage of the fact that I have limited time available at the moment for responding, and it was hasty of me to rely only on memory of your edits at the time. Though I decided the specifics were not as important in my comments at the COI page, it was indeed you who also stated in the article that your own sources are authoritative (even though that may be the case), using an argument from authority with reference to the UN.[6] I should have been more clear that an editor who is also the author of the source material should be not be adding this material at all without first discussing at the article's Talk page, and I apologise if that were not articulated clearly enough. It is also true that you asserted that a controversy exists,[7] which is not appropriate for an editor who is also a substantial source related to the purported controversy. Re-read the opening paragraph of this section for additional problems with your edits. It is not at all clear what motive you imagine I would have for challenging your edits other than pointing out your prima facie conflict of interest (setting aside less significant presentation issues). It is not necessary to rely on the 'wayback machine', since the article history is already available in Wikipedia.--Jeffro77 (talk) 02:50, 22 July 2018 (UTC)[reply]

Although it derived in part from a content dispute, please avoid casting aspersions on article talk pages (personal user pages or administrator noticeboards are the proper venues for that). I think that I'll initiate a conflict of interest noticeboard thread. Jeffro77: it's likely only necessary to post once there and let others assess COI and take any action if necessary. Apart from any potential COI issues, one is to determine if the source is considered reliable for what it's used, for that we have WP:RSN if in doubt. Danielpimienta: I'll post a standard notice on your page in relation to COI. —Paleo Neonate – 17:21, 21 July 2018 (UTC)[reply]

I don't see that the source itself is not usable, but its significance should not be asserted by an editor who is an author of that source, especially where the source asserts that other sources are biased and that a controversy exists in relation to the various sources. (There is also a separate matter of using the ITU as a source for itself (with a dead link) for the assertion that the ITU is a reliable source.)--Jeffro77 (talk) 04:20, 22 July 2018 (UTC)[reply]

PaleoNeonate : I saw the reference to COI in my talk page. I did read attentively COI the first time you mentioned it and I did also opened a thread (Wikipedia:Conflict_of_interest/Noticeboard#Languages_used_on_the_Internet) in the COI noticeboard prior to your personal talk (where Jeffor77 had already contribute in spite of his lack of time :-)). Maybe I have not done it the right way, if so please feel free to open another one and ask me to cancel the one I opened. If not please join.

Jeffro77 : If you do not have enough time to read the source of the conflict (the Wayback Machine access was offered as it is a straightforward and easier access than wiki, at least for me :-)) please refrain to trigger accusations from that content which are easy to contradict and can legitimately be taken are defamation by the author.

Other interested editors: I do understand that it is not an incentive to discuss contents in a context of a talk war. I will then make a proposition to go over that situation and try to have all of us come back to what I understand is the editing spirit of Wikipedia, based on horizontal collaboration.

My proposition follows.

CONSIDERING

1) I am an ethic minded person, believe it or not. I did my best with my past editing to try to solve an issue in a neutral and transparent manner avoiding as much as possible auto-promotion. When I discovered it was deleted entirely I have legitimately reacted to unfonded accusation of unethical behavior, loosing at the end track on what matters to me: the content of the article.

2) I never started a revert war and tried to convey my arguments in the talk page, even comitting to restrain for further editing if the limitations I pointed in the article were correctly addressed by existing editors.

3) However, if, at the time of my edit, I had the knowledge of what I learned in that heated discussion (and especially if I had read at that time the COI page) I would have certainly proceed differently : a) making explicit my status of expert in that field with institutions and projects at stake (signing with my name was an implicit information but that does not respond properly to the COI rules). b) proposing my edition in the talk page instead of editing directly. I am still convinced that my edition did fit within the rules as uncontroversial and I hope the notice board COI will conclude that way. However, referring to the last claim of "using Wikipedia to assert elements of your own work as significant" if can say it was not my intention and the fact that my name appears as author or co-author in several references of this article, prior to my edit, may justify it, I cannot deny this statement was an unavoidable by-product of my editing.

PROPOSITION

Then I propose:

1) to close this thread,

2) to leave open the related COI noticeboard as this subject goes beyond this article and is related to the will of Wikipedia to see more academics publish and it may help understand how they should deal with COI and how "long time wikipedians" shall deal with their proper inclusion, avoiding the use of COI allegations as "trump cards" to avoid addressing content's limitations.

3) I will open a new thread in this page here where I will express my concerns about the limitations of this article and propose solutions after having clearly identified my personal and institutional linkages to the subject.

Finally, when doing this new thread my lack of experience as wikipedian may appear at any time as a problem (understanding the editing rules is not a straightforward process, as I learned it is the cost of the will of horizontal and open collaboration), please bear with me and offer advices instead of accusations. Danielpimienta (talk) 15:04, 22 July 2018 (UTC)[reply]

Vice article

Maybe useful content or links at https://motherboard.vice.com/en_us/article/ezvx9e/the-internet-is-killing-most-languages
—DIV (120.17.128.128 (talk) 03:00, 3 July 2018 (UTC))[reply]

The article needs a serious rework

The article has been practically unchanged for the last years while the evolution of the Internet demographic has been changing drastically with Africa, CIS, Arab states and Asia having shown growth rate of Internet users, between 2010 and 2017, of respectively 327%, 202%, 195% and 179% while the Americas and Europe were respectively at 135% and 120% growth rate (computed from ITU figures and note a related visual animation). This evolution, which put by 2016 the percentage of Internet users of China + India above one third of the total, has obvious repercussion on demo-linguistics data and make hardly trustable figures such as only 2% of contents for the sum of languages spoken in those 2 countries.

Furthermore, the article fails to address several important points that I will try to expose briefly after I present myself as a person having a connection to the subject.

My connection to the subject.

My name is Daniel Pimienta. I am a civil society player and academic, former Head of Networks & Development Foundation, an NGO pioneer in the theme of digital divide since 1988. Funredes created in 1998 the Observatory of languages and cultures in the Internet which has been maintained alive in spite the ending of Funredes in 2017. I worked, in collaboration with Union Latine, a former member states organization which activities are now suspended, in the field of languages on the Internet and producing corresponding indicators (between 1998 and 2007, we were one of the main source of data, together with the Language Observatory Project). I have published a number of scientific articles in that field in English, French or Spanish.

Most of the corresponding activities has been funded by respectively Organisation Internationale de la Francophonie, Union Latine or UNESCO (which is the United Nation organization which is most connected to that theme). Funredes is also a founding member of the World Network for Linguistic Diversity, a non profit organization formed during the World Summit of Information Society and I am a former Executive Secretary and still member of the Board of MAAYA.

MAAYA has served as a niche for some activities related to the subject (for instance we have been collaborating with LOP, another MAAYA member). After 2007, the evolution of Search Engines made our method for production of data obsolete and at the same time the LOP has ceased production activities. That has left for the past years W3Techs and InternetWorldStats as the unique sources of data available for languages on the Internet (ITU does produce on a yearly basis data of users connected per country but not directly on languages, and we have been producing data in the past years, but exclusively for French, Spanish and for the languages of France).

In 2017, we launched a new method which allows to produce data for both indicators for the 140 languages of more than 5 million speakers. The method relies mainly on ITU data, on a large set of sources of Internet related data per languages or per country (including Alexa.com) and on specific statistical computations which are documented in the reference article, as well as the associated biases of the method and sources (as a matter of fact the biases of the 3 existing methods are discussed).

The points which I consider needs to be addressed by the article:

- The difference between the 2 main indicators (% of Internet users per language and % of web contents per language) needs to be clarified as they are often confused by the users of those statistics.

- The methodologies of the 2 sources which are presented (W3Techs for contents and InternetWorldStats for the top 10 languages of Internet users) needs to be somehow presented and a discussion on their potential biases should be made so to warn the readers those figures are not solid data asserted by Wikipedia's voice.

- The article needs to refer to an historical and still existing controversy about the real proportion of web contents in English. This is important as it relates to a policy against the digital divide, whereas a consensus has been reached from all stakeholders in fostering the production of local contents; a steady high figure of English in the Web in spite of its evolutions reducing motivation for local contents. Note that the figure of English contents around 50% was given in 2007 and the demographic evolution of the Internet since then pleads for a much lower figure today (at least it has to be said that they are strong discrepancies between experts on that figure).

- The article finally should add to the picture a third source, as of 2017, which covers both indicators (without being limited to the 10 top for the language of users) and presents extensive discussion of biases of all methods starting by itself. The source (already presented in "my connection to the subject" comes from a former producer of data from the period 1998-2007 and the author or co-author of several references which are already mentioned in this article.

-Discussing the biases, it is important to underline the difference between mother tongue language (also noted L1) and second language (also noted L2). Generally, statistics such as percentage of users of a given language are given for L1+L2 and it is important to note that if discrepancies around L1 figures exist they are much less important compared to L2 figures (for example, L1 figure for English is around 350 million but L1+ L2 figures for English may vary from 500 million toward 1.5 billion, depending of sources).

To conclude I would appreciate if willing editors are eager to try to improve this article which touches a sensitive yet important subject related to the past, present and future history of the Internet. I will restrain to edit it myself since I am one of the potential source and I will keep contributing in the talk page, either on request either if I feel a strong need to react to some edit.

Sorry for the length of the post, the subject is complex and its complexity deserve to be reflected somehow. Danielpimienta (talk) 14:48, 24 July 2018 (UTC)[reply]

"Norwegian Bokmål" versus "Norwegian"

"Norwegian Bokmål" and "Norwegian" are both to the found in the "Content languages for websites" list, even though the former is a subset of the latter. The other written variety of Norwegian - "Nynorsk" - is not to be found in the list. This is not weird considering Nynorsk is considerable less frequently used. My opinion is that "Norwegian Bokmål" and "Norwegian" should be merged into a single entry called "Norwegian", or perhaps "Norwegian (Bokmål and Nynorsk)". If possible, numbers for Nynorsk should also be included in the total.

The original article treats "Norwegian Bokmål", "Norwegian" and "Norwegian Nynorsk" as three distinct things. It doesn't explain what "Norwegian" is supposed to be, but I would say it is probably an average of Bokmål and Nynorsk numbers. Kiwi Rex (talk) 17:06, 29 April 2020 (UTC)[reply]

W3Techs Methodology

I'm surprised to see that there's more content in Turkish than in Chinese, Arabic, and Portuguese combined given that these 3 languages have 18 times more total speakers than Turkish.

Is it due to W3Techs' methodology? The Wikipedia article says that: "language is identified using only the home page of the sites in most cases (e.g., all of Wikipedia is based on the language detection of http://www.wikipedia.org).[6] As a consequence, the figures show a significantly higher percentage for many languages (especially for English) as compared to the figures for all websites.". However on their website they say (here and there):

"We investigate technologies of websites, not of individual web pages. If we find a technology on any of the pages, it is considered to be used by the website."
"We may not detect technologies if they are used only on some pages of a websites, as we do not analyze each page of a website."

So it's unclear whether W3Techs focuses on the homepage or not. If they only consider the homepage then this table is misleading and should be removed from Wikipedia. Indeed the most used Western platforms are translated and available in most European languages (especially Spanish, French, and German) but only English would count. Whereas some countries have popular equivalents not used elsewhere (this is especially the case for Russia), with a homepage in a non-English language. This artificially reduces the count of "Western" languages and artificially increases it for languages of "isolated" countries such as Russia, Iran, and Turkey. The numbers for Chinese are also especially low, I think that Alexa (W3Techs' source) isn't as accurate for China.

So should we keep this table? A455bcd9 (talk) 11:09, 7 March 2021 (UTC)[reply]

I am so glad that an editor finally point out the confusing situation of this article, consequence of the promotion of a source without the necessary bias verification and comparison with other sources.This results on Wikipedia being a vector of disinformation (the article has improved in signalling there is a debate but it starts by a crude statement "Slightly over half of the homepages of the most visited websites on the World Wide Web are in English" which is considered as a false statement from some acknowledged researchers on that field, and not slightly).

I am Daniel Pimienta, head of the Observatory of Linguistic & Cultural Diversity on the Internet, (by the way all links of the wikipedia page to funredes.org go thru the Wayback Machine although ths page is online!) one of the (very few) researchers on that field. I tried sometimes ago to put order to that article and finally resigned to do it; the old discussion is part of this talk page.

W3Techs applies daily a language recogntion algorithm to the home pages of the 10 million websites ranked top by Alexa. Targeting home pages, and not paying due attention to multilingualism, produce huge errors. We have just published the version 2 of our study to produce indicators of the languages on the Internet and transparently expose method, results and biases in http://funredes.org/lc2021. English on the Web is today, following our study, around 25% (versus 30% in 2017... and versus 62% quoted by W3Techs, quite a difference!), Chinese 15% (versus the absurd W3Techs figure of 1.3%) and so on.

I understand the editors may not have the time and the will to read dense technical reports; but at least check the following presentation: "An alternative approach for linguistic indicators in cyberspace, V2. Part 1: English on the Web: the end of a myth? Part 2: the language globalization of the Net is on the way…", D. Pimienta & G. Müller de Oliveira, in Emerging Technologies and Changing Dynamics of Information [ETCDI], organized by ICEIE, under the auspices of UNESCO/IFAP, September 7-9, 2021.

Is it the role of Wikipedia to support a historical disinformation about the dominance of English on the Web? I wish this article will eventually correct its long lasting one side biased perspective or, at least, presents with due caution the one side it pretends to promote. Danielpimienta (talk) 13:47, 8 September 2021 (UTC)[reply]

UNESCO report "The languages in cyberspace" (2021)

Some editor(s) interested in this page may find it useful to know that there is recent article here:
The languages in cyberspace https://en.unesco.org/courier/2021-2/languages-cyberspace
= based on May 2021 UNESCO World report of Languages. 202.166.205.112 (talk) 03:21, 13 November 2022 (UTC)[reply]

Where is Hindi?

One thing that strikes me the most is the absence of Hindi among the 38 most used languages on the Internet shown in the chart. After all, it is the fourth most spoken language in the world with about 600 million L1+L2 speakers. It cannot have a lesser Internet presence than Lithuanian, Latvian or Slovenian. Come on! Any research that says so must be terribly biased. — Preceding unsigned comment added by 212.79.109.244 (talk) 19:37, 16 August 2024 (UTC)[reply]

[1] List of Wikipedias

[2] JW.org

[1]

[2]