Jump to content

Talk:Usage share of web browsers/Archive 5

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 3Archive 4Archive 5Archive 6Archive 7

Clickz

There are some stats from ClickZ at http://web.archive.org/web/20090711201800/http://www.clickz.com/stats/stats_toolbox .Smallman12q (talk) 00:28, 1 March 2012 (UTC)

Wikimedia percentages

In this edit I have made what I hope are some improvements to the new, simpler summary tables. It was good to see Wikimedia data back represented, but it appeared to be utterly at odds with the other figures, whereas in the past, Wikimedia usually provided the majority of the median figures - i.e. it was often right in the middle of the spread. I looked into it, and the reason was the separation of mobile and non-mobile data. When the Wikimedia stats page said 29.2% for MSIE, it meant 29.2% out of the total of 87.5% of non-mobile visits. No wonder it wasn't comparable! The simple arithmetic required is perfectly allowed by WP:CALC. I copy-and-pasted the Wikimedia table into a spreadsheet and added a column based on =B2/B$26*100 to produce true percentages of the non-mobile visitor figure (which happened to be in cell B26). This was so easy that I did the same for the mobile table below it, and added these figures too. I found 'Other' figures in both cases by adding up the figures used (after rounding to 1 D.P.) and subtracting the totals in each case from 100. This is all simple, accurate and useful, and hopefully will not present any problem to maintain. As for the Wikimedia section table in the main body of the article, I have already complained about the complexity of this here, and now do not really know what to do with it. --Nigelj (talk) 20:19, 17 February 2012 (UTC)

Psdie (talk) 15:52, 9 March 2012 (UTC): I think the whole decision to use the Wikimedia stats for the "headline" usage chart is suspect - they serve to heavily under-represent Internet Explorer usage. I smell an anti-IE agenda (popular amongst tech-savvy users, but does no favours when trying to objectively monitor real-world IE market share). Reasons for under-representation:
Who has an anti-IE agenda? That seams like pure nonsense. Ad-based stats like Net Applications do no favor when trying to objectively monitor real world ad-blocking browser usage share. This is a real reason for under-representation in the non-Wikimedia stats you seam to favor.
  1. By counting based on page views instead of unique users, the Wikimedia stats over-represent page-refresh-intensive users of the Wikimedia sites, i.e., Wikipedia editors. Thus the browsers used by Wikipedia editors will be over-estimated in the Wikimedia stats. I suggest that editors are likely to be more technically savvy than "typical" visitors, so are more likely to have an alternative browser installed - i.e., non-IE (standard browser with the most popular desktop OS, MS Windows).
There is no evidence that IE user are more or less refresh-intensive than any other users. Your suggestions are pure guesswork.
  1. The Wikimedia stats combine desktop and mobile stats. IE has no mobile presence, so its share will be significantly diluted in stats that merge mobile usage (currently ~13%). It's not necessarily unreasonable to present combined mobile/desktop usage as the headline figure, particularly given the rising importance of mobile, but this should be made clearer in the labelling.
Net applications also combine desktop and mobile stats so i don't really see your point. This article is about browsers, not operating systems. As mobile browsers are also browsers they belong in the stats.
Personally I believe an aggregate stat (median wasn't too bad, traffic weighted mean would surely be better) as the headline chart would present a more realistic picture. If that's prevented by WP:SYN (and not exempted by WP:CALC) then perhaps omitting a headline figure altogether is the fairest approach - otherwise Wikimedia's stats are being presented as more authoritative and accurate than other sources, which I'd dispute based on #1 above.
I agree. We should weigh in adblock downloads in the stats to get a fairer representation. As wikimedias stats are based on more traffic then the other stats it should be weighed higher then the others. Unfortunately we do not heave stats from equally or more trafficked sites like Facebook and Google.
--Psdie (talk) 15:52, 9 March 2012 (UTC)

Protected

The article has been fully protected two weeks due to the edit war. A WP:Request for comment is one way to get consensus on what belongs in the article. Since this is now the third time the article has gone under full protection, it may be reasonable to use blocks to deal with any warring that continues after expiry. Protection can be lifted if consensus is reached on talk. EdJohnston (talk) 16:47, 17 March 2012 (UTC)

Is it a rule

Is it a rule to update the world map at the start of each month? Why don't we just update automatically when the leadership in a country changes? Thank you all--88.240.39.174 (talk) 16:16, 6 April 2012 (UTC)


Can we have updated stats again please?

As long as text on the interpretation of the numbers is emphasized, and the difficulty in measuring the stats is treated at a place that draws attention, I see no problem with the issues anyone here talks about. So can we please have a wikipedia article that summarizes global stats again?

Especially now, when IE8 and IE7 use is dwindling, people will want to know how many people use html5 compatible browsers...

Can't we present all perspectives, and emphasize the fact that there are perspectives?

We could for instance cluster the stats based on unique visitors in one category and hits in another...

Pretty please? Cause this is an awesome article...

80.112.133.70 (talk) 08:34, 25 April 2012 (UTC)

Wikimedia (April 2009 to present) - chart

isn't android the operating system and not the browser? 193.170.74.203 (talk) 09:17, 25 April 2012 (UTC)

I think that the browser on Android devices is special and unique to Android, so is normally referred to simply as 'the Android browser'.

Wikimedia server logs

Generally not accetable

I just want to remind everybody that graphics of the Wikimedia server logs, like the one here are not acceptable, for a variety of reasons:

If anywhere, they could be used in the Wikipedia or Wikimedia articles, they certainly would be somewhat relevant, but the issues of WP:OR and WP:SYNTH would still remain if the information is not discussed in reliable sources. --SF007 (talk) 23:02, 13 March 2012 (UTC)

The only concern that can be considered at least marginally valid is that of WP:UNDUE, though each of the stats providers have known biases. There is nothing even close to WP:SYNTH, WP:OR and sself-reference. — Dmitrij D. Czarkoff (talk) 23:16, 13 March 2012 (UTC)
I dare to say it is much more than "marginally", since this is not discussed in any reliable source whatsoever. And while this might technically not violate WP:SYNTH or WP:OR, from my own POV, it certainly violaties the "spirit" or "principle" of those policies. It is arguably a self reference, while it does not mentions "Wikipedia", it mentions the "parent", wikimedia. Why should we present the stats from wikimedia? Are they representative in any way of market share? Why not just choose the sats from any other random website? Simply because Wikimedia websites are popular? Because Wikimedia runs Wikipedia? The answer to those questions should have already came from reliable sources... sadly, it is hard to justify the inclusion of such information. --SF007 (talk) 00:08, 14 March 2012 (UTC)
Even if the stats were based on accessing this image it wouldn't be self-referencing for a pretty evident reason: it doesn't reference content at all. It is not WP:SYNTH and WP:OR at all neither in spirit nor in fact: the data is referenced. And we all probably are well aware that squid data is itself pretty reliable source. At least more reliable then known unreliable sources like all those you left intact in the article. That's it: Wikipedia is the 3rd most visited site itself, so Wikimedia projects altogether are at least that much used (not to mention the fact that Wikimedia Commons' content is used throughout the web. If we are talking about the spirit of core content policies, then Wikimedia stats were the only reliable data in the article, as Wikimedia projects are known to have widest possible auditory in contrast to the rest of the sources, and thus the trustworthiness of these stats is out of question. The data in question is collected in the most neutral way possible and is verified in the most objective way – automatically; its sources are easily traceable and can be re-examinated; the chance that this statistics gets purposely misinterpreted in favour of one's commercial interest is neglictable... It is the ideal source for the purpose of all the policies you name. — Dmitrij D. Czarkoff (talk) 00:33, 14 March 2012 (UTC)
I don't think you really address the issue raised by SF007 at all. The problem is not whether you or any other editors considers squid data reliable. When we use raw data to produce a graph we implicitly validates and assign credence to the data. The fundamental problem here is that no reliable source has discussed these numbers, and thus it *is* WP:OR. No reliable source has taken a critical view on the data and opened up for quoting. Thus this is in violation with the goal of WikiPedia. Put another way, if you consider these data reliable and relevant, what source can you quote that these are reliable numbers? What source can you quote that these are relevant? What source can you quote that these numbers are representative for some population? --Useerup (talk) 15:40, 14 March 2012 (UTC)
The reliable source that produces these numbers is the reference given. These are the stats for over 150 billion web requests in a single month, across over a dozen of the busiest websites on the internet. The figures are worldwide and have been made by web users with every conceivable interest. have you got any source that says this is not a reliable source? WP:OR - reproducing results published by a reliable source is not OR. WP:SYN - we do not combine these figures with any others, no sysnthesis of multiple sources takes place. WP:UNDUE - this is a very large sample, and so is significant. WP:SELF - we do not assume that the reader is reading Wikipedia and we don't refer to this or any article on Wikipedia in any special way. Therefore these figures and their refs make perfect sense on any mirror server. Wikimedia is an important part of the web. I see that SF007 (talk · contribs) has gone ahead and unilaterally deleted all that material from the article regardless of this discussion. I shall reinstate it per WP:BRD and it should now stay in the article until this discussion has reached a consensus. --Nigelj (talk) 00:21, 15 March 2012 (UTC)
The burden of evidence lies with the editor who adds or restores material. WP:BRD is not a policy and cannot be invoked as a reason for undoing an edit you disagree with. As for the points:
And if someone thinks that this has not been fulfilled that has to be argued for and/or proven to. Just removing material without proper warning and/or discussion is not allowed.
  • The Wikimedia server logs are WP:PRIMARY. That does not rule out using them, but they should be used with care. They have not been used with care here.
This is a valid point, but applies to all other data used in this article. For example Net Applications use some undisclosed weighting of their data.
  • You state that "These are the stats for over 150 billion web requests in a single month". This number is meaningless unless put into context. You need a RS which say something about how representative or for which demographic this source is representative. You can have 150 trillion web requests, if they are all sampling the same demographic it is not more useful than this number. Sheer volume is meaningless unless put into perspective. By a reliable source, please.
That would be true in the article, but this is a talk page. There are plenty of sources clarifies the things you are asking about, and they is probably useful in the article. But in the talk lack of references cannot be used as an argument.
  • You state that "The figures are worldwide and have been made by web users with every conceivable interest.". Got any RS for that? If so then please put it in the article. If not, your point is moot. Editors don't get to make such assertions.
As above, this is a talk page and not a article. Arguments in the talk page are not "moot" without sources in the talk page.
  • You ask "have you got any source that says this is not a reliable source?". You are seriously misguided as to what Wikipedia is. I or anyone else do not need to provide any source for removing unsourced or improperly sourced material (this being a case of the latter). It is you who need to provide a WP:RS which verifies why this stat is significant, prominent and relevant. Read WP:BURDEN.
Again this is not an article, but the discussion about the quality of an article. Asking for evidence that something is unsourced or improperly sourced goes here.
  • Regarding WP:SYN, agree, there is not WP:SYN as far as I can see. That is not the main problem.
  • You state that "WP:UNDUE - this is a very large sample, and so is significant.". No. It is WP:UNDUE because it is given a more prominent position in this article than what has been discussed by reliable sources. Keep in mind that, in determining proper weight, we consider a viewpoint's prevalence in reliable sources, not its prevalence among Wikipedia editors or the general public. Read WP:UNDUE again.
Why? It is still the most signification statistics referenced in the article. If it is undue so is everything else. There is not a single reliable source that verifies any of the statistics in the article. Wikipedia stats is the least undue because here we have raw data, that's more than we have from Net Applications. I bet you cant find a single reliable source that validate Net Applications data.

--Useerup (talk) 08:21, 15 March 2012 (UTC)

In case of each source the reliable source itself is the source of stats. Neither of figures are discussed, for none of them the population or relevance to any population is discussed and all of them are reliable sources on their own. WP:OR requires that we use reliable sources for content, not that we support reliable sources with other reliable sources. WP:RS and WP:V also don't request that the sources we use should be discussed in other sources. Please just don't start another lame war with no proper grounds – this article is already damaged severely enough. – Dmitrij D. Czarkoff (talk) 09:01, 15 March 2012 (UTC)

In support of the reservations about highlighting Wikimedia stats over others (given bias created by its counting by page views, which are skewed by high admin activity), see my comment under Wikimedia_percentages above. If Wikimedia stats were based on uniques, I'd be more open to highlighting them as typical (which they aren't at present). --Psdie (talk) 15:21, 15 March 2012 (UTC)

@Useerup, I am very familiar with WP:BURDEN, thankyou. It says, "You may remove any material lacking an inline citation", which does not apply here. I won't repeat what Czarkoff just said; it seems obvious to me. Perhaps you should look at WP:EDITWAR, which says, "A potentially controversial change may be made to find out whether it is opposed. Another editor may revert it. This is known as the bold, revert, discuss (BRD) cycle. An edit war only arises if the situation develops into a series of back-and-forth reverts", which is what you just did. That is from WP:V, which is core policy. --Nigelj (talk) 21:41, 15 March 2012 (UTC)
@Psdie, your original point was about the use of a piechart of Wikimedia stats for the "headline" usage chart, was it not? That is something I'd gladly throw into the negotiation pot if everyone was willing to discuss and negotiate rather than delete and edit war. It's interesting that you see these stats as part of a pro/anti Microsoft stance. Did you know that there have been allegations in the past of people being paid specifically by Microsoft to edit Wikipedia?[1] We never find out who may have been paid to come here and add/remove content, but it's always something to be mindful of, within the context of WP:AGF. --Nigelj (talk) 21:41, 15 March 2012 (UTC)
The thrust of this objection (please correct me if I'm wrong) is that the Wikimedia stats are not discussed in other references, and so we are only dependent on a primary source for all of them. Is that correct? In that case, we are also going to have to delete the Statcounter figures, as they are only referenced to statcounter.com, and we don't have any references to other WP:RSs discussing them, their sample size, their methodology, or their reliability. Oh, the same is true for Clicky - totally sourced to getclicky.com. Same for W3Counter. Net Applications seems to call itself Net Market Share these days, and the same is true there. StatOwl.com is the same. It looks like there won't be much left. Which one of you would like to do the deletions? There'll have to be a new explanation written to take their place, as there won't be much left of the article. If these deletions don't go ahead, I'll assume that there was a mistake somewhere in the logic and replace the long-standing Wikimedia stats for our readers' benefit soon. --Nigelj (talk) 23:09, 16 March 2012 (UTC)
Wikipedia probably is not representative of the population due to all us open-source fans. I would vote "no". — Preceding unsigned comment added by 2.80.217.197 (talkcontribs) 04:53, 17 March 2012
Guys, stop edit warring. I've requested that this page be protected for that.Jasper Deng (talk) 04:57, 17 March 2012 (UTC)
So you think that Wikimedia stats are less reliable due to the higher load by users of open source OSs/browsers? Why do you think it is the case at all? Why do you think that StatOwl counting visitors of several Windows-related forums doesn't suffer from the similar issues? Do you know what issues do other figures suffer from? — Dmitrij D. Czarkoff (talk) 07:33, 17 March 2012 (UTC)
Use a source which has been reported by reputable mainstream media then. That's a reliable source. What is your problem with that? The Wikimedia server logs may be accurate, but they are raw data and certainly a primary source. As a primary source it is unacceptable that it is given WP:UNDUE weight over proper secondary sources. As I also cannot find any mainstream or acceptable tech medias which report on statowl, that source should also not receive undue weight considering that we have netmarketshare which is widely reported on in the media. We have to observe WP:DUE and not give undue weight to certain sources because WP editors believe that they are accurate. I have no problem with including the table with proper disclaimer about demographics (other than a bit unease about WP:NOTSTATSBOOK), but giving it prominence in the form of lead graphics is seriously WP:UNDUE considering that it is a primary source. It means nothing what you or any other WP editor thinks or believes about the sources and possible "issues". What matters is what reliable sources thinks about the primary source. --Useerup (talk) 10:30, 17 March 2012 (UTC)
@Useerup, you seem to have missed my point above: we have nothing in the article about what any secondary sources think about any of the primary source statistics. They should all go, by your logic. --Nigelj (talk) 20:11, 17 March 2012 (UTC)
Don't try to put words in my mouth, please. Netmarketshare seems to be quoted a lot in the media. Just follow WP:DUE and use that as the lede. Do not give a primary source with multiple potential issues a more prominent position than the sources which are usually quoted by reputable secondary sources. Simple. --Useerup (talk) 21:43, 17 March 2012 (UTC)
Just to be clear, are you arguing against the appearance of a Wikimedia pie chart in the lede, or are you arguing in favour of deleting all Wikimedia tables and removing all Wikimedia statistics from the article? It's important to be clear. --Nigelj (talk) 22:35, 18 March 2012 (UTC)
I am against using Wikimedia as a representative graphics in the lede. I believe that with proper caution (based on raw data with possibly skewed demographics) the stats from Wikimedia does have a place. I just don't think they should be given more weight than, say, Netmarketshare. --Useerup (talk) 00:23, 19 March 2012 (UTC)
Oh. It's just that in this edit you removed the Wikimedia statistics from the lede graphics, the summary table, and also removed all the historic stats and even the whole section about them from the body of the article. Perhaps you could make your present position on their legitimate use in the article clearer in the RfC below? --Nigelj (talk) 14:57, 31 March 2012 (UTC)

RFC

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Do Wikimedia's server logs constitute original research? If yes, should it be kept? Is the current use of them due or undue weight?Jasper Deng (talk) 01:34, 18 March 2012 (UTC)

  • Keep the material under debate until a clear, material, cogent criterion for its exclusion is established. The working definition for OR is by now so muddy that searching literature or news for a particular quote, or finding a new means of representing data a la Edward Tufte or combining points of fact from different publicly available sources, or paraphrasing or summarising publicly available views or data, get pilloried as OR whenever it suits partisan editors. OR is the most convenient mud to fling. Accordingly, like patriotism, it is the first resort of the scoundrel who finds truths and logic inconvenient. The fact that the OR-rules (and patriotism -- and morality and...) are rooted in good intentions does not detract from our responsibility to examine them with great care and due cynicism whenever they are presented as justification for prohibitions. The Book of Words is for good sense and guidance, not for pettifogging, not even Wikipettifogging, and we should be on our guard against such.
Consider for example the fact that a given graphic representation of particular data includes data concerning that very graphic representation; is that self-reference? Certainly. The fact that a given argument about argument in general by definition deals with itself and is self-referential, is beyond question; it has been a cliche for a long time. But that does not mean that either of these examples is in itself unacceptable or even undesirable. They may be in any given case, but it is necessary to consult good sense, good conscience, good consequences, and a lot of other goods before we invoke hysterical subjunctives and Cretan liars for every text we disapprove of or disagree with. An alarmingly large number of such arguments in WP are settled by exhaustion or appeal to authority. This is unhealthy. (Now, there is a bit of OR, and make the most of it!) Similar principles apply to all the other holy Wikipillars.
Now, then. Truth and reason above all. I hold no brief for either side in the article under discussion, but I vote for the fair, good-faith, good-sense and constructive use of any representation, even though I have some very snotty views on snappy pie charts. (Edward Tufte had some really good points!) If anyone has a better presentation, bless him and go for it, say I. But if the best he can come up with is lawyering about data that might refer to WP among other subjects, or that unearthing publicly available data or data that can be displayed publicly in an illustration, but does not already appear in other textbooks counts as OR, then go away and explain yourself elsewhere. I have seen nothing in the arguments so far that moves me to forbid the material. JonRichfield (talk) 07:17, 18 March 2012 (UTC)
  • Keep: the original research is the the contribution, that is primarily based on contributors' own experience and/or knowledge. The rendered Wikimedia usage stats is published independently of all the Wikipedia contributors and constitute a valid secondary sources (with the primary source being Wikimedia's logs). For the purpose of WP:OR they fall under the same category as all the other sources of statistics, though they are less affected by known biases due to well-defined methodology and population. The removal rationale is specifically flawed, as it is based on the assumption that these stats as source should also be a subject of coverage in reliable sources; this in fact means that the reliability of a source is assumed to depend on publisher's notability, which is not the requirement on Wikipedia. — Dmitrij D. Czarkoff (talk) 20:16, 18 March 2012 (UTC)
    • I missed the "due/undue" thing. Each stats item on this page references exactly one source. That is: StatOwl references StatOwl, Wikimedia references Wikimedia, Net Applications references Net Applications, etc. Consequently all the stats have equal weight in sense of WP:DUE policy. — Dmitrij D. Czarkoff (talk) 22:30, 18 March 2012 (UTC)
    • Equal weight does not mean that all sources should be given the *same* weight; rather it means that viewpoints (stats) should reflect the weight given to them by secondary sources. I have not seen Wikimedia visitor stats used by any secondary RS. On the other hand I often see Netmarketshare used. This means that Netmarketshare should be given more weight than Wikimedia and certainly not the opposite. --Useerup (talk) 00:30, 19 March 2012 (UTC)
  • Keep - The discussion seems silly. Did someone invent the numbers? No, they are cold hard facts. Daniel.Cardenas (talk) 22:06, 18 March 2012 (UTC)
  • Keep - I think the debate above is mostly founded on a confusion over Wikipedia, Wikimedia, and individual Wikipedia editors. If an individual editor, or some group of them, set out to trawl through some Wikipedia pages and thereby produce some statistics about the web in general in order to add some point to an article, then that would fail WP:OR. If when they added the point they said in the article, "We found this out by searching other Wikipedia pages", then that would fail WP:SELF too. This case is quite different: the figures were being published by Wikimedia long before they were added to this article; Wikimedia is an established and very significant web publisher worldwide; and references to Wikimedia as one among many independent sources of significant web visitor statistics are nothing like a problematic self ref. There is no requirement imposed on any of the other sources of stats that they have been discussed or validated by any tertiary source, so the only reason such a requirement is being suggested for these seems to be due to a misunderstanding regarding these preceding points. The limitations of any individual set of web usage statistics are well discussed in the article. In the days when we used to add median figures to the summary, the Wikimedia figures often supplied a significant number of the median figures (or were part of the pair that did). This shows that they are not outlying or surprising in any way - they are another solid source of valid figures, close to the middle of the spread seen from the various other sources each month. --Nigelj (talk) 22:29, 18 March 2012 (UTC)
    • Wikipedia is about verifiability, not truth. It doesn't matter at all whether you believe the numbers are in line with other statistics. What matters is whether you can find a reliable source which has dealt with that issue and has a viewpoint on it. If you believe that Wikimedia statistics is widely held as representative for the web population in general, then you should have no problem finding a source which supports that assertion. Useerup (talk) 00:37, 19 March 2012 (UTC)
      • On the other hand, which reliable source said the numbers are flawed?Jasper Deng (talk) 00:40, 19 March 2012 (UTC)
        • None. But reliable sources routinely use netmarketshare. My problem is with the WP:UNDUE weight given to these numbers. Someone likes to play statistician and make nice graphs out of the numbers. Given that they are numbers from primary sources and there are legitimate concerns about how representative they are, Wikimedia server logs or any "illustrations" based on them should not be presented as more prominent than numbers for which there actually *are* sources which use them. Remember WP:BURDEN? Useerup (talk) 00:51, 19 March 2012 (UTC)
          • No, WMF server logs aren't primary sources because we didn't make any browsers. It's clearly verifiable because we aren't claiming that the logs are an absolute count, only that they are our count. BURDEN does not apply to DUE.Jasper Deng (talk) 00:53, 19 March 2012 (UTC)
          • I don't think WP:BURDEN applies here, it does appear to be reliably sourced. A source need not be independent to be reliable, and I believe WP:SELFSOURCE applies here as well, the numbers don't claim to be representative of the internet as a whole, but specifically of readers of Wikipedia. These numbers therefore seem pretty relevant to the interests of, well, readers of Wikipedia. - SudoGhost 00:58, 19 March 2012 (UTC)
  • Comment: I need comments from uninvolved editors for this RFC to be useful.Jasper Deng (talk) 00:40, 19 March 2012 (UTC)
    • uninvolved editor here, and I would Keep the server logs. There is no official standard over web browser usage statistic, and independence third-party sources is as a practical matter unable to do fact checking on this kind of data. The only place where one can hope to find third-party fact checking on statistic is on national voting, global warming, and dissertations and then only if there is a strong communal suspicion of wrongdoing. The best we can do here is apply common sense, watch out for fringe, and do proper attribution. Belorn (talk) 13:42, 20 March 2012 (UTC)
  • Keep - Wikimedia statistics are no different to statcounter, netmarketshare and other statistics. Wikiolap (talk) 04:54, 20 March 2012 (UTC)
  • Remove - (via RfC) - This seems very much like a self-reference to avoid as it is an unnecessary reference to Wikipedia's projects and website. It also risks violating wp:undue as it places Wikimedia with equal billing with statistical sources which may (or at least should) represent much larger portions of the web spread across more than a single website (or a single set of websites). IMHO the article should only include statistics which represent usage of substantial proportions of the web. It's difficult to tell which article fulfil this definition as the article gives few clues of what certain data sources represent. No information is given on what kinds of information is represented by Clicky, StatOwl.com, OneStat.com, ADTECH, WebSideStory, the GVU WWW user survey or any of those listed after that one.
On a completely tangential line that I felt I should also say:
  • The article seems to be littered with external links.
  • Information on old data sources, like TheCounter.com, is written in the present tense.
  • It's taken for granted that we understand the difference between mobile and desktop browsers. (Are mobile browsers just phones or does it include laptops?) I'm guessing it should be "smart phones/tablets" v. "personal computers". — Blue-Haired Lawyer t 01:21, 31 March 2012 (UTC)
Which statistical sources do represent larger portions of the web spread across more than a single website? Belorn (talk) 07:56, 31 March 2012 (UTC)
  • Remove Including the Wikimedia statistics is the worst kind of data cherry-picking. This article should only use data from highly-regarded Web analytics vendors with a wide reach (i.e., inclusion of many sites versus a single site/family of sites) and publicly-available methodology. This isn't a knock at Wikimedia, or of their data collection methodology, or any such thing; it's just that looking at any single site or family of sites is going to be misleading, at best.
Here's a non-Wikimedia example of what I mean:[ds 1]
Desktop browser share
February 2012
World-wide Ars Technica sites
IE 52.84 12.31
Firefox 20.92 28.81
Chrome 18.90 34.05
Safari 5.24 19.17
Opera 1.71 1.93
Other 0.39 3.73
Now, those numbers might be interesting in the context of how AT readers compare to the rest of the Web, but they're meaningless if you're trying to actually learn something about, oh, the overall usage share of web browsers. Another example of this are the statistics from W3Fools W3Schools—their numbers only apply to their sites, and so, are not representative of the Web as a whole. And consequently, their numbers aren't used as representative data; instead, they're in the external links section. The Wikimedia numbers suffer from the exact same problem.
If we look at where the news media get their data, the field narrows down pretty quickly to two candidates: Net Applications and StatCounter.[ds 2][ds 3][ds 4][ds 5] Wikipedia should simply follow the lead of the reliable sources; no more, and no less. DoriTalkContribs 03:13, 2 April 2012 (UTC)
  1. ^ Bright, Peter (1 March 2012). "Browsing behavior in February: Internet Explorer and Chrome down, Firefox up". Ars Technica. Condé Nast Publications. Retrieved 1 April 2012.
  2. ^ Dingman, Shane (20 December 2011). "Internet Explorer 8 no longer world's most popular web browser: report". The Globe and Mail. Retrieved 1 April 2012.
  3. ^ Leonhard, Woody (1 November 2011). "Worldwide browser share numbers show Chrome way up". InfoWorld. Retrieved 1 April 2012.
  4. ^ Albanesius, Chloe (1 December 2011). "Chrome Overtakes Firefox in Global Browser Share ... Or Does It?". PC Magazine. Retrieved 1 April 2012.
  5. ^ Capriotti, Roger (18 March 2012). "Understanding Browser Usage Share Data". The Windows Team Blog. Retrieved 1 April 2012.
http://www.netmarketshare.com/?source=NASite looks good, and it has a Usage Policy that looks compatible with the WP's license. http://statcounter.com/ has a default copyright notice, saying all rights reserved. To use the data here on WP, we need the data to be under a compatible license. So as ending question, in your opinion, do you think we can/should use the one source (netmarketshare.com) and remove all other statistic, and if so, would using a single site be compatible with WP:weight? Belorn (talk) 09:03, 2 April 2012 (UTC)
I knew this, but I guess it's worth pointing out: NetMarketShare is Net Applications (note the copyright at the bottom of their pages)—so everything I said about NA also applies to NMS. So far as StatCounter goes, so long as we don't copy and paste chunks of their reports, I think quoting them is the same as quoting any other WP:RS. WP is fine so long as it's properly attributed. DoriTalkContribs 19:54, 2 April 2012 (UTC)
Copyright on data points is a tricky matter, and I would be cautious with it. It should be safe to write in our own words a summery of statcounter, but any direct copy of their data onto a table (IE X%, firefox Y%, Chrome Z%, ...) should I think be avoided. In a book/news article, small snippets of text can be cited, but statistics are not useable with just snippets of data. Netmarketshare is thus far better as we can freely use their data so long it is attributed. Belorn (talk) 22:16, 2 April 2012 (UTC)
A couple of points:
  1. Unlike the W3Schools and all the sites monitored by other sources Wikimedia monitors the site receiving hits from nearly all human internet users.
  2. Like the rest of sources Wikimedia tracks more then one site: the media from Commons is used in multiple locations. Though Wikipedia generates the overwhelming amount of hits, some hits from people who don't use Wikipedia (if there are any) also get recorded in Wikimedia stats. — Dmitrij D. Czarkoff (talk) 21:13, 2 April 2012 (UTC)
What I'm hearing you say isn't what I think you mean to say…
  1. W3Schools monitors the sites they run; Ars Technica monitors the sites they run, and Wikimedia monitors the sites they run. How are these different? In all of these cases, you're getting a self-chosen slice of Web visitors. Browser usage stats are only meaningful when you're looking at data from a wide variety of different sites around the world.
  2. I don't understand what you mean here—are you saying that Wikimedia monitors non-Wikimedia sites?{{cn}} But honestly: Wikimedia monitors the Wikimedia family of sites and only the Wikimedia family of sites. And that is why their data aren't meaningful. DoriTalkContribs 00:13, 3 April 2012 (UTC)
I think you misinterpret the whole issue:
  1. The diversity of monitored sites is one of possible approaches to neutralizing stats, though it has its flaws. Using one (but nearly most used) site is another approach to neutralizing stats, which also has its drawbacks. The assumption that multiple sources are better is simply false, as eg. StatOwl is known for significant share of sites with dominance of corporate users that are using the browsers imposed by corporate policy on them, thus making a strong bias. Similar concerns are true for other similar sources.
  2. Wikimedia monitors Wikimedia sites including Commons. Commons' images are linked from many parts of the web (example), so Wikimedia ends up monitoring quite a few sites. — Dmitrij D. Czarkoff (talk) 05:14, 3 April 2012 (UTC)
  • Keep. It is published statistics by wikimedia about the browser usage of it's users. It's no different than using similar statistics if they were published by Google. Charwinger21 (talk) 07:37, 2 April 2012 (UTC)
Google doesn't release their data, but if they did, it would still be useless in this regard. Google's stats—just like WIkimedia's—may be large in number, but they are not representative of the entire Web. DoriTalkContribs 00:13, 3 April 2012 (UTC)
But they would represent greater and more diverse portion of the web then all of the sources in the article. With Wikimedia omitted, Google stats' population would be even greater then all of these combined, which makes it effectively less prone to specific biases. — Dmitrij D. Czarkoff (talk) 05:19, 3 April 2012 (UTC)
Honestly… just because a a vendor has a larger sample size of self-selected people doesn't mean that group is any less self-selecting. Is it possible that Google's IE numbers might be under-representative because MS might be sending people to Bing instead? Or that mobile Safari's numbers might be low because iPhone 4S owners are using Siri? And do you really think that Google would do a better job of reporting Chinese browser usage stats than Baidu? Single source is single source is meaningless outside of that particular context. DoriTalkContribs 23:51, 3 April 2012 (UTC)
Multi site sources has the same type of self-selecting as single source, just with different group of people. Customers of Netmarketshare has grouping in the same way users of google has. Maybe bloggers prefer one type of website statistic tools, web shops a second type, and government a third. The statistics will always has some form of bias, so the goal should be to primary use those that has a reputation of openness and correctness. Belorn (talk) 07:17, 4 April 2012 (UTC)
Exactly, all sources have biases; thus using sources with known and easy to describe/understand biases is clearly beneficial over using sources that don't give information on their flows. — Dmitrij D. Czarkoff (talk) 08:31, 4 April 2012 (UTC)
No, again—that's called WP:OR. What WP is supposed to do is follow the lead of WP:RSs, and as I showed above, they use NA and SC. Find a solid cite where WM stats are used as or as part of an example of overall browser usage, and then you have something worth including in the article. This article should be based on secondary sources, not cherry-picked data. DoriTalkContribs 05:36, 5 April 2012 (UTC)
It's neither WP:OR, nor is it "cherry picked". What WP:OR says and how you're using it are two entirely different things. Independent is not synonymous with reliable. I have no doubt the reliability of the server data for reporting the data it's being used to support in the article. It is not original research to use Wikimedia's server data, as long as it's clear that it's nothing more than Wikimedia's server data. - SudoGhost 05:46, 5 April 2012 (UTC)
In this context sources are the sources of statistics. Please point me the policy or guideline that says that we can only relied on reliable sources that other reliable sources rely upon, or just stop this. — Dmitrij D. Czarkoff (talk) 07:17, 5 April 2012 (UTC)
You must also consider WP:WEIGHT. Practically no other source discusses WM counters while NA and SC are often cited as references for this very topic (browser usage share). Given that NO sources discuss WM counters they arguably do not belong here. Under no circumstances can WM stats be allowed to take a more prominent position than NA or SC. --Useerup (talk) 15:56, 5 April 2012 (UTC)
As I wrote above, all the numbers have the equal WP:WEIGHT as long as there are no two sources reporting identical numbers. The matter is further complicated by the fact that commercial statistical services send press releases with "breaking news" stats changes to media houses free of charge, which is a promotional action that is supposed to trigger interest in buying their paid services. As Wikimedia stats provide the full data free of charge, they just don't have their place in this game, thus the method of determining reliability of stats by mass media citations just can't give the adequate results. Furthermore, even if we forget about the whole press releases thing, the media you propose to rely on is IT-related media, which is competent in IT, but isn't in statistics; in fact the weight they land to these or those stats is not related to the WP:WEIGHT, which is supposed to exhibit the expert acceptance of views. As the user agent stats are not discussed in professional statistical publications by independent authors, we just don't have the grounds to determine the proper weight of the sources, and thus we have to fall back to other relevant policies: WP:V and WP:OR. As there is no issue with those, we end up with logical conclusion – unless we have the tool of selecting the appropriate sources, we should report all the sources as having equal weight, unless we have documented proof of the reasons we should exclude particular source (eg. as in case of AT Internet). — Dmitrij D. Czarkoff (talk) 18:36, 5 April 2012 (UTC)
  • Keep - Per my comment here. Being "representative of the entire web" is not the purpose of these statistics, nor is that what the data tries to suggest. It is not original research to include the data, it is verified by a reliable source (although perhaps not independent). - SudoGhost 05:51, 5 April 2012 (UTC)
  • Weak remove - I am not comfortable with WM stats being cited where they are not reported by any RS. I am not dead set against keeping them here, but they cannot be allowed to take a more prominent position than the sources which have actually been cited by RS. Hence, they should not be quoted in the lede and should not form the basis of a graph where NA or SC could be used. --Useerup (talk) 15:56, 5 April 2012 (UTC)
  • Keep - I believe that the results should be included in the article, BUT, Wikipedia should change its policy on disclaimers so that you can include a disclaimer stating that the statistics are only the results of wikipedia's site usage, and may or may not be true for everyone using the web. Without that disclaimer, I vote remove.Thepoodlechef (talk) 17:30, 9 April 2012 (UTC)
  • Keep The removal argument seems to be based on an over-zealous reading of certain policies. There is no research being published for the very first time here-- it's produced elsewhere and made available freely. WP:SYNTH would be violated only if any conclusions were specifically drawn that were unsupported; extrapolating to global usage would be such a conclusion. Since that is rather easy to avoid, however, I don't see the problem there. As for WP:UNDUE, the only source of this type that would not be unduly weighting some segment of the internet would be some record of the entire internet browser usage, which obviously does not exist. If there is a serious concern about undue weight, include more charts, because no single one, generated anywhere, will satisfy. IMO, including wikimedia browsing data like this displays a certain honesty on the part of wikipedia, since it is an acknowledge that the the project does not exist in some sort of pure information cyberspace, but rather on the actual web, hosted on actual computers, and being browsed by actual people with actual software. siafu (talk) 04:38, 28 April 2012 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Comments

For those who like netapps and statcounter a few points:

  1. What do you think about stats were you have to paid to be counted?
  2. What do you think about being able to see raw stats versus manipulated numbers?
  3. Where do you think you will find ipad usage on statcounter? Hint you have to pay extra to see it.

Daniel.Cardenas (talk) 15:07, 2 April 2012 (UTC)

The article makes it crystal clear that the usage statistics are ESTIMATES and change regularly. I'm watching the Wikipedia editors who like to start flame wars and try their best to remove (or carefully reword) any edits which go against their favourite software company. How do you think I found this debate? I'm not stupid; I know why certain articles that mention a giant software company has a few Wikipedia editors fighting tooth and nail to protect the public from reading the true facts. TurboForce (talk) 12:49, 9 April 2012 (UTC)

google analytics ?

google analytics stats anyone ? --Johnny Bin (talk) 06:49, 26 April 2012 (UTC)

How? Daniel.Cardenas (talk) 17:53, 29 April 2012 (UTC)

Figures for Wikimedia pie chart?

Much as I love it, where do the figures for this chart come from? In the diagram we see, for IE, Chrome, Firefox, Safari, Opera, Android and Other respectively, 25.93%, 24.99%, 21.79%, 14.09%, 5.04%, 3.18% and 4.98%. From the source[2] for 'All requests' we see 25.36%, 24.99%, 21.77%, 5.82%, 3.71%, 2.99% and therefore 15.36%. For 'Html pages' we see 26.58%, 20.90%, 20.92%, 4.81%, 2.30%, 2.77% and therefore 21.72. There is no source on the image page, none in the caption, and no hint of what calculations, if any, are being put into this diagram. I would be much happier with an SVG image that anyone could update and edit, displaying the actual figures we can all clearly see in the source. --Nigelj (talk) 17:40, 9 May 2012 (UTC)

OK. Now I've taken the trouble to bring all the figures together, I can see what we're being shown:
'All requests'
non mobile tablets other mobile Total
IE 25.36 0.55 0.02 25.93
Chrome 24.99 24.99
Firefox 21.77 0.02 21.79
Safari 5.82 2.65 5.62 14.09
Opera 3.71 1.33 5.04
Android 0.19 2.99 3.18
Other 4.98
Total 100.00

The problem was that none of this was obvious - to me anyway. Per WP:V, this should be made clear somewhere. --Nigelj (talk) 18:02, 9 May 2012 (UTC)

I asked creator on talk page about this and was told for example that I.E. added tablet and mobile numbers also. Not sure what the solution is to the confusion. Perhaps expand this article table to do the same? What do you think?   Thanks! Daniel.Cardenas (talk) 18:05, 9 May 2012 (UTC)
Thanks Daniel. Having gone to the trouble of creating it, I copied the table above onto the graphic's Commons page. I think that covers it. Every figure was, in fact, perfect. --Nigelj (talk) 18:24, 9 May 2012 (UTC)

Google Chrome Now the No. 1 Browser in the World

Chrome is now #1. If someone can please update the article. source. Joseph507357 (talk) 16:05, 21 May 2012 (UTC)

Sample sizes

I just reverted some large scale changes made by Mwarren us (talk · contribs). The main reason that I followed WP:BRD here was that at least some of the new figures that were prominent were clearly grossly in error. Mwarren us's version stated that the Wikimedia stats were based on '1' website, whereas, from the article's own section on the figures, it says, "These server logs cover requests to all the Wikimedia Foundation projects, including Wikipedia, Wikimedia Commons, Wiktionary, Wikibooks, Wikiquote, Wikisource, Wikinews, Wikiversity and others[21]", in every language. It also stated that these figures were based on a 'Pageviews' sample of 15,722. A glance at the cited source shows that the sample size to be 15,722,000,000 HTML page squids where squids are defined by 1:1000-sampled server logs. In other words, the full sample was 15,722,000,000,000 HTML pages served, equivalent to a request count of 128,552,000,000,000. Secondly, the link given regarding arithmetic means appeared to be to a discussion section that closed an RFC. In fact it was to a comment by Useerup (talk · contribs), who, I'm sure won't mind being described as a participant in the RFC. I can't find the actual RFC at the moment, or remember who formally closed it, but it is clear that the link given was not to the official closing comments. Some other aspects of the series of edits may have been valid, but I did not feel that it was right to leave these errors on display. Please discuss changes you would like to make here, one at a time, so that we can all agree on their value. --Nigelj (talk) 21:51, 21 May 2012 (UTC)

I don't understand why some people get hung up on the sample sizes. For the stats listed in this article, the sample sizes are large enough to drive the variance of the percentages to a very small value. The reason that the stats vary from source to source is that they are sampling from different populations. -- Schapel (talk) 22:07, 21 May 2012 (UTC)

Medians in Usage share of web browsers

Since there is really no consensus above and everyone involved can agree on nothing, I ask for outside comment on whether the medians should be included.Jasper Deng (talk) 00:12, 5 January 2012 (UTC)

  • This is an oppose or support situation. I oppose the median.Jasper Deng (talk) 01:13, 5 January 2012 (UTC)
  • Oppose using a median value in the table. This is my first time viewing this particular article and commenting on the talk page, so forgive me if I missing something that was already gone over above. I'm not seeing why a median figure is necessary when there are only (currently) five figures for each browser in that table. The problems with using a median value seem to outweigh any benefits, in my opinion. - SudoGhost 08:41, 5 January 2012 (UTC)
    What problem do you see with using a median value? Daniel.Cardenas (talk) 21:41, 6 January 2012 (UTC)
    I would rather ask what problems would be caused by not having a median value. There's no point in having it in the article, and the lack of the median value would not be a detriment to the article in the slightest. It is not a reliably sourced aspect, and while I have no opinion on whether or not it is WP:OR, I don't see any good arguments for inclusion of an unsourced median value that shows the median value of only five figures. - SudoGhost 00:07, 7 January 2012 (UTC)
  • Oppose for the reasons I gave above - the figures being operated on are grossly incompatible. Also the results don't give a percentage of he result. Plus I think any graph done of the figures can be done better otherwise without the synthesis. If somebody outside wikipedia wants to do this we can report on their results no matter that they are silly, for us to do it ourselves is just wrong and why are we making up things that nobody outside of Wikipedia can be bothered making up and writing about? Dmcq (talk) 18:23, 6 January 2012 (UTC)
    • That said, there are basically two ways of making statistics accessible by readers:
      1. choose among sources of statistic (rather tricky, as involves evaluating biases, and is evident WP:OR) or
      2. collect whatever is available (and passes WP:V bar) and summarize (using median or any other tool agreed upon).
      So the question basically is: what is better? Regardless of this and other discussions the maintenance of this article will lead to one of these options. — Dmitrij D. Czarkoff (talk) 00:56, 7 January 2012 (UTC)
  • Support for the reasons I gave anove and in all the preceding discussions on this topic. — Dmitrij D. Czarkoff (talk) 18:44, 6 January 2012 (UTC)
  • Support, what? again a pool? Come'on close either the pool as non consent or use the small majority as yes or no. The word will also destroyed (did say the Mayas), so who cares? mabdul 18:50, 6 January 2012 (UTC)
  • Support the medians. Are we just going to keep repeating the poll until people get bored and someone wins by default? I think this is getting disruptive, as the summary tables have not been updated since September last year. This, if I recall correctly, was when significant regular contributors were driven off the article by these interminable arguments. Months and months of a few people holding the summarisation of the article to ransom. Puh. --Nigelj (talk) 19:56, 6 January 2012 (UTC)
  • Comment My initial thoughts are that not only is this original research, it's faulty original research. But I'll withold judgement for now. I've asked a few questions at WP:ORN and await responses.[3] A Quest For Knowledge (talk) 20:02, 6 January 2012 (UTC)
  • Support Its helpful. Yes helpful is not an excuse for putting something in, but wikipedia exists because it is helpful. Correct it is not necessary, but is wikipedia necessary? Daniel.Cardenas (talk) 21:41, 6 January 2012 (UTC)
  • Oppose because
    1. The median may be a simple calculation but it's applicability (as required under WP:CALC) is anything but simple in this case. The sources use different observations, different methodology. The sources sample different demographics/populations. Some sources try to account for their recognized bias by "correcting" using CIA numbers about Internet use in each country. The end result is a mess of incomparable sources being treated with equal weight even though some of them sample only a small fraction compared to others.
    2. The median is calculated across multiple sources which are selected by WP editors. Thus, the median is controlled by WP editors and not supported by any one source. No source is cited which directly supports such a calculation or the chosen selection. This is improper synthesis
    3. The numbers in the table over which the median is calculated have been "corrected" by WP editors. Because not all sources break out the observations in the same way (some don't report "mobile"), editors have found it necessary to "correct" those sources using the total/mean of all the stat counters. Thus, those "corrected" numbers are not supported by any source! This alone is violation of WP:SYN, but is necessary because editors want to calculate the median.
    4. The median numbers are useless for comparisons. Because the median of each column is calculated in isolation, the medians do not come to 100%. Indeed, the current numbers add up to 102.2%. So the medians tell us that more browsers than 100% of are being used?
    --Useerup (talk) 00:29, 7 January 2012 (UTC)
  • Oppose Medians should not be used, because the sources are not the same size and their usage is not comparable. For example, 80% of Canadians support the Queen as head of state, therefore the median level of support for constitutional monarchy in North America is 40% (assuming it is zero in the U.S.). It is unusual anyway to apply a median to percentages. The most appropriate comparison would be to provide a total for all the sources then provide an average of users for each browser. But then we might want to provide weightings for each of the sources. That however is something that we would want to find in a source, not conduct ourselves, per WP:OR. TFD (talk) 00:57, 7 January 2012 (UTC)
  • Medians are a fine calculation but oppose WP:OR of the sources of data to apply the medians too; they should be removed from this article. Nobody Ent 04:09, 7 January 2012 (UTC)
  • Note Since the local community cannot seem to come to a consensus, I have posted this RfC at Centralized discussion. This should attract many more editors to help determine a solution. Hasteur (talk) 05:37, 7 January 2012 (UTC)
  • Oppose If you add up the median values in the Sept 11 table you get 102.2%. Its mathematical nonsense. --Salix (talk): 20:10, 7 January 2012 (UTC)
    • Sorry, Salix, but You just have shown why the RFC for this topic was a very bad and damaging idea. The median values are not supposed to add up to sums of the source values. This is not an issue here, and this is no way connected to concerns this RFC is associated with. If You still want to participate in this discussion, You might want to give a glance to the discussions above this section. — Dmitrij D. Czarkoff (talk) 20:45, 7 January 2012 (UTC)
      • He is entitled to his own opinion, let's respect that. The whole point here is to solicit outside comments, and it doesn't have to be about the OR of this.Jasper Deng (talk) 20:46, 7 January 2012 (UTC)
      • Sorry, Dmitrij D. Czarkoff but Salix has a perfectly valid point: Each browser median will be used to compare it against the other browsers' medians. And when reported as a percentage readers will expect the numbers reported to be "fractions of 100". The fact that the sum of the medians can exceed or fall well short of 100% (not due to rounding errors) illustrates how useless they are, apart from being WP:OR. The editors have even avoided illustrating the "median shares" in a pie chart because they ran into this very problem. So rather than realizing that the medians are wrong, they swept the problem under the rug by using a bar chart instead. --Useerup (talk) 21:01, 7 January 2012 (UTC)
        • @Jasper Deng: Sry, I know that we are all no experts and that might be good - non experts writing an encyclopedia for non experts - but this isn't even statistics - that is math of the 9th grad (or so) and having a !vote based on a wrong memmory (in the case that he/she has learned it somewhen)?
        • @Useerup: Please check our last (or that before) archive why we are using a bare chart. (the short answer is: because pie charts are evil) mabdul 21:07, 7 January 2012 (UTC)
        • That just doesn't make sense to me. If it was true, these talk pages would be flooded and the tables were in a middle of a constant were. Effectively, the fact that the situation is the opposite only shows that median does its job: most readers just seek for a summary, and most of the rest understand the use of median. The questions appear when RFCs or other discussions draw public attention to the line. — Dmitrij D. Czarkoff (talk) 21:57, 7 January 2012 (UTC)
          • This is assuming that people that do not understand medians will know how to use a discussion page. Lack of something is not proof of the opposite. - SudoGhost 22:12, 7 January 2012 (UTC)
            • Yes - and it also assumes that they are not all just "going away" happy with the "answer" (or laughing at us, or confused). If they are, and the "answer" is sub par, then we have failed. Not everyone complains or comments. Begoontalk 07:55, 8 January 2012 (UTC)
              • Which answer would not be "sub par"? Our problem is not with identification of issues, but with addressing them. So if You know the better way to help the readers understand the data, could You please share Your thoughts on it? — Dmitrij D. Czarkoff (talk) 09:28, 8 January 2012 (UTC)
                • No, I don't know a better way to present it, sorry, but more importantly I really don't think it's the sort of aggregation, interpretation and analysis of sources we should be doing. Sorry. Begoontalk 13:09, 8 January 2012 (UTC)
                  • So what is your vision of the right way to cover the subject? The only goal of the whole discussion is to find a viable solution that can be accepted as a consensus. Eg. Useerup stated that a table of data needs no summary, VsevolodKrolikov suggested to represent summary with a chart of a most cited source. I say that only a numerical summary can help. Do you share any of these opinions? Any other idea? Or what is Your input to the consensus building? — Dmitrij D. Czarkoff (talk) 14:02, 8 January 2012 (UTC)
                    • Ok, framed thus as a question - no summary. I'm not comfortable with us deciding how to combine and summarise data from disparate, dissimilar sources and constructing any analysis of that, even an average, because that is research we should not be doing. That's the best I can do to frame my opinion in your terms. Begoontalk 14:49, 8 January 2012 (UTC)
      • The problems with the medians are a classic case of why we have WP:NOR. Statistics done right is the selection of meaningful data and a lot of work in stats departments goes into deducing what is meaningful. Here we are attempting to do a Meta-analysis of multiple studies but not using an established technique, if we are using a median of percentages, not something I've seen before. A weighted mean would have little more statistical pedigree. The technique clearly has flaws, not adding up to 100% is just one. What the data is really telling us is that sampling effects are strong when measuring web-browser usage, for example wikimedia is clearly not a representative sample of web users. As some of the data ranges are from 35.1% to 50.9% its questionable if we should be reporting that many decimal places, indicating false confidence in the data. If we want to report this data faithfully we should really show error bars letting the user know how much trust to put into the data.--Salix (talk): 08:55, 8 January 2012 (UTC)
        • You address it as if we were doing a statistical study. Our data is actually known to have not only different samples, but also a different population; our sources are known to have biases, but not disclose them. I think the weighted mean wouldn't be any more accurate. And for wikimedia specifically: why do You think other sources to be more credible? — Dmitrij D. Czarkoff (talk) 09:15, 8 January 2012 (UTC)
  • Oppose So we self select 5 sources of data, in many ways different samples (Wikimedia sample leaps out in this sense as limited), then we present the median (3rd largest) as somehow significant or useful? That's how it appears from looking at the article, and I'm trying to do just that to give an opinion unbiased from the reams of discussion above. If that's correct about what we are doing here - I oppose. We shouldn't be doing this research. The median (3rd ranked) of 5 figures garnered in this fashion? I just can't see where or how that is useful. Couple that with the problem that they look like percentages to the casual reader, who would therefore expect them to sum to 100. If my take on this isn't correct, please say so and I'll reconsider. Begoontalk 07:50, 8 January 2012 (UTC)
    • AFAIK, the amount of sources isn't the matter of selection. Sources that are found to pass Wikipedia policy for sources are included. Eg., the sister article about OSs has twice as much sources. I strongly agree that we should not select sources, but we need a way to represent the stats to the reader. You oppose median; which form of summary do You propose to replace the median? — Dmitrij D. Czarkoff (talk) 09:15, 8 January 2012 (UTC)
      • I honestly don't know. I think the problem I really have is that it seems to me that once one decides to aggregate and interpret data like this, from disparate sources, what one has, in effect, done, is produce one's own "Survey of Surveys" or "Poll of Polls". If that is done somewhere else, with published methods and rationales as to choices of source, summary methods etc..., we might be able to use it as a source - but if it's actually our research creating this analysis, well, I guess you see where I'm going with that. Begoontalk 09:23, 8 January 2012 (UTC)
        • I see no conclusion. You think we should just report the data? Or to avoid data completely? — Dmitrij D. Czarkoff (talk) 14:02, 8 January 2012 (UTC)
          • Report the data by all means. Just don't provide a number calculated as an average of 5 dissimilar sources as though they were perfectly comparable. And in the event you still do, despite it being wrong to do so at all in my opinion, don't use a median. People understand means, and that's what they expect to see, generally - anything else is likely to confuse. If all that means you can't summarise at all, then don't summarise at all. That's really about as far as I can go to match my opinion to your question. Begoontalk 15:08, 8 January 2012 (UTC)
            • The problem with the mean is that it would make it even more obvious that there is a very serious problem with weight in this table. Should the sources have the same weight (obviously, no) or should we compensate/guess using some other source (more WP:OR)? The basic problem is that the sources - despite all reporting browser usage shares - are not compatible at all and we should not be doing any type of calculation which assumes that.--Useerup (talk) 15:57, 8 January 2012 (UTC)
  • Comment. Each median value is a percentage, and it is comparable with 100 in the same way as any other percentage. It says, "Of the N most reliable figures Wikipedia can find for this browser for this month, the median usage figure for browser X is A%". And so on for browser Y, Z etc. If a browser's median usage figure creeps over or under 50% for example, that is significant, whatever the row of medians adds up to. That's the only thing that makes no sense - adding up the row of medians. Just don't do this, as it gets you nowhere. That's one of the reasons why we dropped the pie chart - in effect it adds up the row of medians, which is a mad thing to do. The fact that you can't add them up does not make each of them invalid in its own right. --Nigelj (talk) 12:19, 8 January 2012 (UTC)
Yes, but, notwithstanding my basic objection that this is analysis/research we shouldn't be doing, isn't this true?
  • One of the main arguments for using a median is to reduce the influence of "big outliers" in a large sample.
  • It is, here, being applied to a sample of 5.
  • The sample data for the median is percentages.
  • By definition, percentages are confined to a range of 1-100, somewhat reducing the likelihood of "big outliers".
And, if we are honest, isn't there, anyway, a tiny hint here that we are using median as something that might avoid WP:CALC, because we really, deep down know that we're crossing, or over, the line of doing our own research here? (yes, I read the rest of the page, now).
Apologies if my maths/statistics knowledge isn't fully up to speed, I'm largely basing my supposition on medians and their usefulness from a discussion I had with a real estate agent, explaining that it helped to exclude massively overpriced palaces from local property price averages. Begoontalk 12:43, 8 January 2012 (UTC)
@NigelJ: And yet the medians are plotted in a graph directly encouraging comparison of the medians; omitting the fact that readers should actually re-scale the medians if they want to compare them. Of course, comparing the medians would be wrong since they are created from sources which doesn't even claim to state the same kind of numbers. Some sources tries to extrapolate to global usage shares, other sources report their raw usage shares. Doing any type of summary on such numbers is just flat out wrong. It's apples compared to slivers of orange peel.--Useerup (talk) 15:40, 8 January 2012 (UTC)
"the fact that readers should actually re-scale the medians if they want to compare them" is actually wrong. Each median is a percentage and so is comparable with 100%, and therefore is comparable with other percentages, and medians of percentages. All you cannot do is add them up and expect to see 100%. It is perfectly valid to say, "Based on the most reliable figures Wikipedia has been able to identify, the median usage of A just went above 50%", "Based on the most reliable figures Wikipedia has been able to identify, the median usage of A is now two percentage points greater than the median usage of B", and "The usage shares reported by statistics provider P are usually within 5% of the medians based on all the most reliable figures Wikipedia has been able to identify". --Nigelj (talk) 16:12, 8 January 2012 (UTC)
  • Oppose . I agree with already said arguments against median. In our graphs we can choose a single source, i propose StatCounter, already used in some. The only valid "pro" of median is the synthesis, but due to the few sources, in my opinion it is useless. Subver (talk) 13:54, 8 January 2012 (UTC)
    • As all sources have biases, using one of them as a source for a plot will constitute a plain violation of WP:WEIGHT with no benefits. Having a fueled debate to avoid something the minority of editors regard as violation of policy to replace it with something that is plain violation of policy is... strange (wording optimized per WP:NPA). — Dmitrij D. Czarkoff (talk) 17:53, 10 January 2012 (UTC)
  • Oppose. A median is a meaningless number when the inputs are not comparable. kop (talk) 06:16, 12 January 2012 (UTC)
    • But the input is perfectly comparable. It only differs in biases — that exact thing median is supposed to fix. — Dmitrij D. Czarkoff (talk) 11:12, 12 January 2012 (UTC)
      • The sources sample different populations and they may very well sample different behavioral patterns (unique users versus page impressions). The populations they sample are of very different sizes. One of the sources tries to extrapolate to global usage shares; others don't. They are not comparable. Yet, in a median (or mean) calculation they are given equal weight, the result (global usage share???) is not clearly defined and if you compare percentage points you err because they are not scaled to 100%. if the sum of the medians hit 110 (which is possible although right now it "only sums up to 102%), comparing percentage points and concluding that browser A has 2 percentage points more usage than browser B you would err by about 10% --Useerup (talk) 14:53, 12 January 2012 (UTC)
      • How can you say they're comparable when they're not reproducible, not verifiable, and, pointedly, are computed based on populations that are not randomly selected and which therefore represent nothing but themselves? The meaning of each metric is therefore questionable; and entirely unknown with respect to global browser share, which is what the median is suppose to pertain to. Further, as you note, arguments which pertain to the median also pertain to the mean. Yet nobody is arguing that the mean is meaningful -- it's obvious that the mean is not meaningful because it can't be weighted when sample size is unknown. It should be equally clear that when you take a median you must know what you're taking the median of, and nobody knows how to compare the different survey's sample populations. kop (talk) 08:15, 15 January 2012 (UTC)

Note: this RFC was supposed to help building consensus. Therefor it's not enough to say whether you support or oppose the median. Please also state your view on how the user agent statistics should be presented. Eg., a table with raw data, a table and a plot (which plot?), a table with a weighted mean line, a table with a median line, just a text that such studies are performed, or any other way. Please, make sure you not only criticize, but also suggest something. Otherwise your effort will actually turn out to further fuel the dispute. — Dmitrij D. Czarkoff (talk) 14:09, 8 January 2012 (UTC)


Note: Unlike what Dmitrij D. Czarkoff claims above, you are not obliged to present any alternative way to present the data. However, if you support or oppose please state your main reasons for doing so. If the median and "correcting" calculations are found to be original research it is simply deleted. If there is no clear consensus either for or against, it is simply deleted (WP:CALC requires consensus for a calculation to add it or keep it in).--Useerup (talk) 15:04, 8 January 2012 (UTC)

How many new polls have there been on these medians, here, at Talk:Usage share of operating systems and elsewhere in recent months? --Nigelj (talk) 15:28, 8 January 2012 (UTC)

Ending the RFC

The conclusion is that there is no consensus on whether the median is an appropriate calculation. According to WP:CALC there must exist consensus for keeping the median; otherwise it must be removed. The median is already removed through other changes and there seems (absense of edits) to be consensus that the changes are appropriate (good work!). I have removed the RFC tag. --Useerup (talk) 19:36, 27 January 2012 (UTC)

Shouldn't the beginning of Usage share of operating systems' page updated to remove the "A discussion is being conducted..." text, then? 195.23.92.74 (talk) 19:34, 19 March 2012 (UTC)
Removed the averages per the no consensus result of the above RFC discussion. Please continue the discussion here before changing that edit. Thanks! sn‾uǝɹɹɐʍɯ (talk) 03:36, 24 May 2012 (UTC)

Mobile vs desktop

Cause of the tag, it needs to define the concept of "desktop" and "mobile". For me it is clear: mobile include smartphones and tablets, and their correspondent operating systems (Android, iOS, etc), "desktop" include proper desktop and laptops and their correspondent Operating systems (Windows, Mac OS, etc). In this form is grouped by the sites that register the browsers share. — Preceding unsigned comment added by Palacesblowlittle (talkcontribs) 15:05, 17 July 2012 (UTC)

StatOwl

This site has two serious problems: Number one: Since may 2012 it doesn't show valid stats anymore. So It has to be moved to older reports section. And two: It has only stats of USA, it can't be together with global stats. it must be apart.

Yes, I don't see any recent data from StatOwl, so I think we should move it to the Older Reports section. I don't know about moving it "apart" otherwise, though, because I don't see any statement or implication that StatOwl's data are representative of global usage share. -- Schapel (talk) 22:30, 1 August 2012 (UTC)

An Animadversion

As a web developer, I certainly root for Chrome and FF over IE. But, as a scientist, I know there's often a big difference between what we want and the reality. The article expresses a bias and with much more confidence than warranted.

On the browser wars, here's a dissenting view Internet Explorer market share surges, as IE 9 wins hearts and mind that gives IE more market share than FF and Chrome in March 2012, an idea that seems to be supported by the page you link at the bottom of the article Browser News > Stats.

I'm just thinking that, even with all the cautions noted in the article, three (or four!) digits of precision is misleading, and in general, the stats should be put forward much more tentatively than they are and contrary positions given some space. JKeck (talk) 15:36, 22 August 2012 (UTC)

I think you've hit upon a basic misunderstanding that bites many people when they discuss usage share. We cannot know the actual "global" usage share. All we can know is for a given set of sites, what is the usage share of each browser. Each stats company can measure to a high degree of precision the usage share of browsers for the set of sites they monitor, although it doesn't make much sense to give more than one or two decimal places in the percentages because second or third decimal place can change on a daily or weekly basis. Each stats company uses a different set of sites, and none of them use an unbiased sample. Some stats companies uses stats primarily from a websites in particular country, and some use stats primarily from larger companies' websites. The best we can do is take each of these data points as a very good educated guess, and average them together to get the wisdom of the crowd, which is a best guess at global usage share. -- Schapel (talk) 18:26, 22 August 2012 (UTC)

iPad is mobile?

the mobile stats break down safari into iPhone and iPod. what about iPad? seems like an important omission, or is it just included in one of those categories? Spot (talk) 01:30, 4 September 2012 (UTC)

I had an email exchange with statcounter ruffly 6 months ago and they said they categorize the ipad as a console so it is in neither desktop or mobile statistics.   :(   Daniel.Cardenas (talk) 03:11, 4 September 2012 (UTC)

Restore logical order to stat providers

Further to removal of NetApps, just noticed DC also reordered the Historical Usage Share section in Nov 10 to place StatCounter at the top for no apparent reason. Previously the providers were listed in order of how long they've been operating - i.e., Net Apps, W3Counter, StatCounter, Wikimedia, Clicky. See long-term contributor Schapel's confirmation of this in the "Restore Net Apps stats" section above.

I propose this order is restored rather than the random order they've been shuffled into. Further, the Summary Tables were also in this age order, now StatCounter is randomly at the top - suggest restore. If there's a decent logic behind the current order, fair enough - let's hear it. — Preceding unsigned comment added by Psdie (talkcontribs) 02:50, 14 September 2012 (UTC)