Wikipedia talk:Wikipedia Signpost/2014-09-17/News and notes
Appearance
Discuss this story
Possible typo in image caption: "new manage" --> "new manager" ? --Hispalois (talk) 05:45, 18 September 2014 (UTC)
- @Hispalois: I've fixed it. Feel free to make this kind of correction yourself in future - as Wikipedia:Wikipedia Signpost/About says, "post-publication edits such as grammatical and spelling corrections to articles are welcome". -- John of Reading (talk) 06:39, 18 September 2014 (UTC)
- We need to solve this pageview issue. When we at WPMED do outreach to media they want to know what impact we are having in the developing world. Currently we do not really know.
- With respect to templates, the inconsistency across Wikipedia is a huge issue we are facing with the translation project. A basic set of templates is needed. This is something I would love to see the WMF work on.Doc James (talk · contribs · email) (if I write on your page reply on mine) 06:06, 18 September 2014 (UTC)
- Article sounds like it blames Henrik. WMF has had years to build a real system to replace stats.grok.se. --LA2 (talk) 11:23, 18 September 2014 (UTC)
- Er. Replacing stats.grok.se would not be the solution here; the files SGS is relying on lack the records </pedantry> Ironholds (talk) 12:24, 18 September 2014 (UTC)
- Correct. I've done some minimal re-wording to fix that in the lede. Hope that this is okay per Signpost editing norms. West.andrew.g (talk) 13:20, 18 September 2014 (UTC)
- Er. Replacing stats.grok.se would not be the solution here; the files SGS is relying on lack the records </pedantry> Ironholds (talk) 12:24, 18 September 2014 (UTC)
- When I read the title, I thought that page views had fallen by 1/3rd. "Off" implies something different than the context suggests, in this case. Maury Markowitz (talk) 10:48, 18 September 2014 (UTC)
- I regularly present to health organizations about how I share the information in their fields of expertise on Wikipedia. All of them are surprised to hear about the number of pageviews that health articles on Wikipedia get, and become more interested in Wikipedia because of the audience it has. Having accurate information about the number of pageviews that Wikipedia articles get contributes to a persuasive argument that experts should contribute to Wikipedia if they want to reach the audience seeking information in their field. I appreciate all efforts to better describe Wikipedia's audience because I think good descriptions of Wikipedia's audience are necessary to attract more expert contributors to our community. Blue Rasberry (talk) 13:36, 18 September 2014 (UTC)
- Great work, guys. Accurately counting your users, patrons, customers, etc. is pretty basic stuff. My non-tech guy impression of this, coupled with the Media Viewer debacle, is that the WMF is not up to the technological challenges facing Wikipedia. I have no idea if this is a fair assessment or not, but I'd love to see you follow up on this by getting the Foundation's perspective on these matters. Gamaliel (talk) 15:06, 18 September 2014 (UTC)
- Alas no scoop here. Andrew West didn't 'discover' that mobile traffic per article is not counted. I told him on several occasions. At least as early as Aug 20 2014, and project WikiMedicine at least a year earlier. Many WMF reports (on page views per title) contain a very clear notice in the introduction (in red, as shown here) that "Please be aware that pageviews per article are not yet captured for Wikipedia's mobile site. Average underreporting will be 15-20%, but may be much higher for languages mostly spoken in the Global South, where a larger share of web access happens via mobile phones.. This category of reports contains the notice since about 18 months [1]. The following report, dated July 2013, on health articles in wp:en, with the same unmissable notice, was prepared specifically for requests from project WikiMedicine and was sent to some key members (still) of the project and 'WikiMedicine discussion' list. As for the defect itself, the Analytics and Engineering Teams discussed how to fix this on several occasions and finally decided not to repair legacy software (which would have been either very time consuming or even undoable given our infrastructure, as one aborted attempt some 2 years ago revealed). It was not in any way moved under the carpet. Personally I'm taking pride in being very transparent to the community about what we can deliver and what not (yet). And I'm sure the same holds for most colleagues. Erik Zachte (talk) 16:01, 18 September 2014 (UTC)
- Could you please find someone willing to take pride in producing accurate statistics? 72.130.129.212 (talk) 18:19, 18 September 2014 (UTC)
- Would you have the guts to sign your comment? Erik Zachte (talk) 18:34, 18 September 2014 (UTC)
- This is a big deficiency. But it's incredibly unfair to portray Erik or anyone else working on this as not taking pride in accurate statistics. It's a very hard problem with a lot of moving parts. Transparency about my motivations and expertise - I'm one of the people actually working on the problem. I'd love to see that transparency reflected in everyone else's commentary, too (Gamaliel wins points for starting from the premise of 'this is just my opinion as an observer', even if I disagree with that opinion to some degree). Ironholds (talk) 19:42, 18 September 2014 (UTC)
- I appreciate Erik's work and we've had productive a productive back-and-forth on this. I think the talk page at [2] makes my first-person story clear. On Aug. 9, I told a user that "yes! we are counting mobile views" (paraphrased) based on my flawed assumption that since mobile requests are hitting the same content as desktop views, that presumably they would be counted in the totals. When shortly thereafter I learned that this was not the case, I again posted to that talk page. I never claimed to 'discover' anything, rather it "came to my attention", and tried to place no blame: "This very well could be an artifact of an earlier system that was not prepared to handle mobile views. I am in no position to comment ..." I posted this where my stakeholders and consumers of the data could find it. I accuse no-one of non-transparency. It is not a triumph of my own to make this revelation -- quite the opposite -- it means I have to tell those I collaborate with that I was distributing incomplete numbers and the flawed assumption that could cause them to be mis-interpreted. West.andrew.g (talk) 21:48, 18 September 2014 (UTC)
- Totally; this wasn't a critique of you or your actions, here. Ironholds (talk) 22:34, 18 September 2014 (UTC)
- Could you please find someone willing to take pride in producing accurate statistics? 72.130.129.212 (talk) 18:19, 18 September 2014 (UTC)
- Question: Does this underreporting affect our estimates of total Wikipedia usage? --j⚛e deckertalk 19:45, 18 September 2014 (UTC)
- It shouldn't. So, we actually have two streams - one is per-page impression data (basically, URL aggregation with some amount of filtering and parameter-stripping) and global PV data. They're processed using different filters (for obvious reasons. Pageview != impression). The global pageviews count includes mobile data. This shouldn't be having an impact on the overall tracking. Erik, obviously correct me if I'm getting this wrong Ironholds (talk) 19:48, 18 September 2014 (UTC)
- We were both answering at same time, here is my version of a similar answer:
- No, or hardly, webstatscollector writes two files per hour, one tiny file called projectcounts with total views per wiki, mobile and non-mobile as separate counts, one huge file called pagecounts with views per page title. That is not to say that numbers in projectcounts are perfect, for one we still need to filter bot requests. Erik Zachte (talk) 19:58, 18 September 2014 (UTC)
- Cool, this doesn't mitigate the "Oh s**t" curve, then. Good to know, even if it's not an answer we enjoy. --j⚛e deckertalk 20:00, 18 September 2014 (UTC)
- I thought that was the editor retention graph? We need more expletives or less bad news, one of the two ;p. Ironholds (talk) 20:08, 18 September 2014 (UTC)
- Cool, this doesn't mitigate the "Oh s**t" curve, then. Good to know, even if it's not an answer we enjoy. --j⚛e deckertalk 20:00, 18 September 2014 (UTC)
- Ah, yes. The global south is mentioned. Killiondude (talk) 20:35, 18 September 2014 (UTC)
- @Erik Zachte: Do you know if there any project in place to start building a stats.grok.se replacement (perhaps in Wikimedia Labs)? There is a clear desire all round to see page view data and it's not suitable to have to rely on an external website to do this. We link to this website on every page yet it carries the foreboding warning: "This is very much a beta service and may disappear or change at any time". It's apparent that a lot of problems would be caused for Wikipedia should it go down. SFB 13:40, 20 September 2014 (UTC)
- @Sillyfolkboy, no clear-cut plans. An earlier initiative to build an API on top of the current data stream, was abended. The present team took that in as confirmation that perfect is the enemy of good, and also we better take things one step at a time. Current focus is on stabilizing, extending, and tuning the raw data feeds. Colleagues at Analytics Engineering are working as we speak on porting the current webstatscollector tool to the new hadoop environment. Webstatscollector is at the basis of all external reporting on traffic, not only at stats.grok.se, and at much of WMF reporting. Once migration has been done, hopefully both mobile stats and binary file requests can be added. I'm saying hopefully, as capacity is always an issue. The infrastructure has way more capacity (and flexibility) than the old, but as always it takes just a few overoptimistic choices to again fill any scaled up system up till strangulation point. So one step at a time. Implementing hadoop and all of the software stack that comes with it is a serious undertaking for a small team, making it robust and tuning it takes time, and the team has many tasks on their plate. But we are looking right now into the finer details of how to extend the data feed without losing downward compatibility. Once that data feed is more reliable, complete and unambiguous (e.g. bots counted separately on the wiki level, and omitted on the page level), more flexible querying would be a new challenge. Most likely that would be a post-processing step and a separate data warehouse outside hadoop (which as I understand is is more batch oriented). When, how, and by whom are currently not our focus, one step at a time. Erik Zachte (WMF) (talk) 20:32, 21 September 2014 (UTC) Erik Zachte (WMF) (talk) 14:53, 21 September 2014 (UTC)
- @Erik Zachte (WMF): Thanks for the reply. Sounds like there is a lot of ongoing change at the moment, so that's understandable that (re-)development based on the current interface is low priority. Good luck with the upcoming work. I hope the above article and comments confirm that potential usage of these stats isn't restricted to esoteric nerd research – they are integral to understanding our users! SFB 00:23, 22 September 2014 (UTC)
- @Sillyfolkboy, no clear-cut plans. An earlier initiative to build an API on top of the current data stream, was abended. The present team took that in as confirmation that perfect is the enemy of good, and also we better take things one step at a time. Current focus is on stabilizing, extending, and tuning the raw data feeds. Colleagues at Analytics Engineering are working as we speak on porting the current webstatscollector tool to the new hadoop environment. Webstatscollector is at the basis of all external reporting on traffic, not only at stats.grok.se, and at much of WMF reporting. Once migration has been done, hopefully both mobile stats and binary file requests can be added. I'm saying hopefully, as capacity is always an issue. The infrastructure has way more capacity (and flexibility) than the old, but as always it takes just a few overoptimistic choices to again fill any scaled up system up till strangulation point. So one step at a time. Implementing hadoop and all of the software stack that comes with it is a serious undertaking for a small team, making it robust and tuning it takes time, and the team has many tasks on their plate. But we are looking right now into the finer details of how to extend the data feed without losing downward compatibility. Once that data feed is more reliable, complete and unambiguous (e.g. bots counted separately on the wiki level, and omitted on the page level), more flexible querying would be a new challenge. Most likely that would be a post-processing step and a separate data warehouse outside hadoop (which as I understand is is more batch oriented). When, how, and by whom are currently not our focus, one step at a time. Erik Zachte (WMF) (talk) 20:32, 21 September 2014 (UTC) Erik Zachte (WMF) (talk) 14:53, 21 September 2014 (UTC)
- Referring to somebody as "prominent" does not make that person prominent. GeorgeLouis (talk) 02:50, 22 September 2014 (UTC)
I am not Pranav Curumsey
[edit]In the section on the Indian chapter, you quote my email to the Wikimediaindia-l and you inadvertently (?) call me Pranav Curumsey. I am Pradeep Mohandas and I forwarded an email of Pranav Curumsey's resignation from the Wikimedia India Chapter members list to the public Wikimediaindia-l, which you quote in your story calling me Pranav. Please read this carefully before reporting. Thanks. Thiruvathira (talk) 15:24, 23 September 2014 (UTC)
← Back to News and notes