Wikipedia talk:Wikipedia Signpost/2014-08-20/Op-ed

Discuss this story

Hi Denny, good article and interesting topic.

A small note: Maybe you should flip around File:A_new_metric_for_Wikimedia_-_3.png, because the next graph changes left-right direction. Maybe easier to understand.
As to your question, "But are these really the primary metrics we should keep an eye on? Is Editor engagement the ultimate goal? I agree that these are very interesting metrics. I am not sure if they answer the question whether we are achieving our mission. But how could an alternative look like?" I think the metrics we don't have and we should keep an eye on are:
- (A) article quality metrics that scale (are the articles reliable, readable, sourced, grammatically correct, updated, vandalised etc.) We have pretty much nothing. and
- (B) maintenance metrics: Which article types need more maintenance or less maintenance than others (Animal species? Biographies of Living People? Bio of dead people? Company articles? Sports Tournaments? Rivers/Mountains and other geographic features?) How can a stagnating editor base keep up a growing article mass? Which tools allow us to do more effective maintenance and push the boundaries?

I think these are far more important and pressing questions for us than "how many percent of the infinite 'all knowledge' do we cover?" (see User:Emijrp/All human knowledge). And, as you correctly pointed out: "The energy and interest of volunteers cannot be allocated arbitrarily." / "But the question that the movement as a whole faces is how to prioritize the allocation of resources that the movement can actually allocate?" --Atlasowa (talk) 11:42, 19 August 2014 (UTC)[reply]

Hi Atlasowa, thank you for the comments and your careful reading. I have changed the direction of the graph, that's a good catch. I agree that article quality and maintenance play an important (and more immediate role) than many other metrics we are currently gathering. I took the liberty to extend my article based on your comments, and also will add quality and maintenance to the Meta project page. I would say maintenance metrics are a derivative for quality, and quality plays an important role for availability of knowledge -- but I stay with my statement that the ultimate goal is accessibility of knowledge to every one. --denny vrandečić (talk) 15:37, 19 August 2014 (UTC)[reply]

Hi Denny, that was fast! WRT "I would say maintenance metrics are a derivative for quality", now if i look at your quality metrics at meta:Research:Measuring mission success i find this:

- - Quality
    - persistence (implied review)
      - content persistence
      - reverts
      - article survival

You're not measuring article quality at all! You're just implying that if the stuff sticks, it's good ^^ If sticking is quality and "maintenance metrics are a derivative for quality", then... you lost me... :-) Hm, I guess the botgenerated Cebuano and Waray-Waray WP have excellent quality according to these quality metrics? --Atlasowa (talk) 17:46, 19 August 2014 (UTC)[reply]

Good point and good examples! Note that the tree is from Aaron, whereas I am answering - there is the root of this inconsistency. And his tree was a first draft in order to have something out to start the conversation on wiki instead of emails - I really appreciate his fast work. But it doesn't mean we agree on every detail. I actually extended the points you referred to in order to reflect how they could start to measure quality - feel free to do so as well. --denny vrandečić (talk) 18:19, 19 August 2014 (UTC)[reply]

Hi Denny, OK, i'll try to write about quality metrics and maintenance at meta:Research talk:Measuring mission success. That you link for "collection of works on understandability" to de:Benutzer:Atlasowa/Verständlichkeit is very flattering, but... my notes and lists are an unreadable mess (ironic, i know) and mostly in german. How about a link to http://www.readabilityofwikipedia.com ? "Calculate readability for... Enter the title of any Wikipedia article here to test its readability using the Flesch Reading Ease Score." "Results for: Tokyo Stock Exchange@English Wikipedia The Flesch reading ease score is 55, this means that 64% of the articles on Wikipedia are harder to read than this one." (The outcome of this formula can be interpreted by the following table: 0 - 30 Very difficult; 30 - 50 Difficult; 50 - 60 Fairly difficult; 60 - 70 Standard; 70 - 80 Fairly Easy; 80 - 90 Easy; 90 - 100 Very Easy. When writing for a general public, one should aim for a score between 60 and 70. Academic publications mostly score below 30. Note that the formula only works for English texts. [1]) Best, --Atlasowa (talk) 13:18, 21 August 2014 (UTC)[reply]

I'll link to both, Atlasowa. I didn't mean to be flattering, but it was the most comprehensible list on the issue I could think of at that moment. :) --denny vrandečić (talk) 15:10, 21 August 2014 (UTC)[reply]

You both might be interested in Wikipedia:Short popular vital articles which was an attempt to quantify the most important articles which appeared (by their raw size) to be lacking in detail. Presumably the baseline there can be used to measure how fast the entire cohort is growing. Novis Ordo (talk) 02:08, 24 August 2014 (UTC)[reply]

That's an excellent list, Novis Ordo, thanks! --Atlasowa (talk) 20:34, 24 August 2014 (UTC)[reply]

Fascinating big-picture questions. At some point Wikipedia's scope comes into play in its ability to spread knowledge -- pressing for free mobile coverage in developing countries would help and is laudable, but so would teaching everyone English. The sharing of knowledge would be much more efficient if we all read English (bye-bye Alemanic asteriod project), but no one would argue that should be the goal of Wikipedia. How to focus resources is a very important question, even within areas we all agree is in our core scope. Paris Hilton has many dogs (not only one you ignoramuses!), and its not easy to get someone interested in that subject to expand Maryam Mirzakhani, people with different knowledge sets need to be encouraged to join the project as metrics of success are defined.--Milowent • ^hasspoken 18:31, 22 August 2014 (UTC)[reply]
A bit of information that would guide my work as an editors is "which parts of our articles do people read most". Is it mostly the lead. Mostly certain sections? I already direct my work to English Wikipedia's most read medical articles. And would love to see this tool Wikipedia:WikiProject_Medicine/Popular_pages appear in other languages. Doc James (talk · contribs · email) (if I write on your page reply on mine) 23:53, 23 August 2014 (UTC)[reply]
- I believe there is some research about which parts get read most, and to what extent, as someone mentioned some figures to me once. All the best: Rich Farmbrough, 02:35, 24 August 2014 (UTC).

Hi Rich! There is some analysis of session length at meta:Research:Mobile_sessions, and amongst the results "... we are able to identify a dataset-specific cutoff point to identify 'sessions' - 430 seconds. This provides a clean breakpoint, and is in line with existing research on session time as applied to Wikipedia." That is ~7 minutes for a Wikipedia article, and that is consistent with similar analysis for blog posts (see links at meta:Research talk:Mobile sessions). --Atlasowa (talk) 20:37, 24 August 2014 (UTC)[reply]

Thanks for that Atlasowa! All the best: Rich Farmbrough, 21:00, 24 August 2014 (UTC).

I enjoyed this rather eccentric approach, and agree with the main thrust, but a serious flaw is the idea of what is "done". Actually much of our "serious" content is very poor, and that it does not change is purely due to lack of editors. Too many Wikipedians just seem to look at the length of articles, and count the number of references. Actually reading them is too often a sobering experience, even without specialist knowledge, and worse if you have it. Much of our core content is unacceptably low in quality, and it is precisely this that tends to get left, because improving it essentially means starting from scratch. Articles that are only half-bad are much more attractive targets, and more likely to attract the small remaining band of editors ready to actually write text. Johnbod (talk) 16:19, 24 August 2014 (UTC)[reply]

If it is regarded eccentric to focus on our vision instead of other measures, then this points out to the fact that this article is rather timely :) --denny vrandečić (talk) 16:48, 24 August 2014 (UTC)[reply]

It is not the focus (which has long been my own) but the approach to measuring it that I thought eccentric. Johnbod (talk) 17:03, 24 August 2014 (UTC)[reply]

Thank's for the clarification. I would very much like to hear alternatives. This proposal is really not more than a first stab. --denny vrandečić (talk) 17:16, 24 August 2014 (UTC)[reply]

I would certainly change "done" to something else. If I were doing it I would be tempted to split the black area into two parts, one representing what we have some coverage of, and the other an essentially subjective guess at what is covered with some degree of quality. Unfortunately the article assessment system is of little use for this for the vast majority of articles below Good Article. Johnbod (talk) 14:23, 25 August 2014 (UTC)[reply]

If I read the bars correctly, a person without internet access has access to none of the world's knowledge. You do realise that's nonsense, I hope? Deltahedron (talk) 16:40, 24 August 2014 (UTC)[reply]

Extend your sentence to "a person without internet access has access to none of the world's knowledge through the work of the Wikimedia movement". I did not mean to measure any possible access to knowledge, as I wouldn't know how to even start, but merely to focus on what the Wikimedia movement provides. This is a limitation of the approach, and it could (and maybe should) be extended in that sense, I do not know. If you have ideas on how to estimate access to knowledge outside Wikimedia, please add it to the Meta project page. Thanks for raising the point. --denny vrandečić (talk) 16:48, 24 August 2014 (UTC)[reply]

Fair enough, but even then it's not quite true. There are off-line readers that can be preloaded with wiki text. But they might not be common. Deltahedron (talk) 20:40, 24 August 2014 (UTC)[reply]

There are also a huge number of printed books that are made from WP content. I doubt that they penetrate the places where people have no Internet access though.

There are certainly a lot of schools in Kenya that have off-line copies of WP. All the best: Rich Farmbrough, 21:00, 24 August 2014 (UTC).

I quite agree! Also wanted to point out a new tool I've built, Quarry that lets people explore our databases for research purposes from a web friendly way and share the results. Might be useful for less technically minded Wikimedians who want more power than Wikimetrics :) YuviPanda (talk) 12:55, 25 August 2014 (UTC)[reply]

Sweet! Thanks, Yuvipanda! --denny vrandečić (talk) 16:01, 29 August 2014 (UTC)[reply]

One further point -- related to the one Johnbod made above -- is that longer does not always mean better. Almost always a short but well-written article can be preferable & more useful than a longer but diffuse article -- even if the longer version has more information. -- llywrch (talk) 17:52, 25 August 2014 (UTC)[reply]

Absolutely correct. --denny vrandečić (talk) 16:01, 29 August 2014 (UTC)[reply]

an interesting viewpoint, indeed. sometimes i wonder what effect a geomagnetic storm would have on your graphic above, denny. --ThurnerRupert (talk) 01:47, 27 August 2014 (UTC)[reply]

Quite devastating, I reckon. The file would probably not be accessible anymore. --denny vrandečić (talk) 16:01, 29 August 2014 (UTC)[reply]

Thank you for this, this is exactly the kind of discussion we should be having! Jan-Bart de Vreede 217.200.185.43 (talk) 09:05, 27 August 2014 (UTC)[reply]

Thanks! I hope you carry it to the relevant places, Jan-Bart! --denny vrandečić (talk) 16:01, 29 August 2014 (UTC)[reply]

Don't want to be ultra-pedantic, but you misspelled diphtheria... -- AnonMoos (talk) 04:39, 27 August 2014 (UTC)[reply]

Aww. Thanks. Too my defense, smarter people than me do it too. Fixed it. --denny vrandečić (talk) 16:01, 29 August 2014 (UTC)[reply]

Guy who lost keys

Interesting proposal. Just something about the parable 'Guy who lost keys'. I loved the subtlety of this version: pub chat between Hoyle and Feynman: (skip to 24:00).

Thanks for that! :) --denny vrandečić (talk) 15:47, 29 August 2014 (UTC)[reply]

Some skeptical comments

Denny, this is a great article but I am somewhat skeptical of the metric and especially of its suggested relevance for the allocation of resources. Honestly, I would be worried that funding decisions on the basis of the suggested metric would do more harm than good because they simplify our complex goals too much. To give you at least one example: it seems to me that the metric is biased against small language projects for a number of related reasons:

(a) One of the strengths of small language Wikimedia projects is that they build free knowledge communities and contribute to the preservation of linguistic and other local knowledge. I think that Wikimedia's mission should also include these aspects of knowledge preservation and global community building of people who care about free knowledge. If we do not consider this part of our mission, we could simply slash any support for languages with less than x million monolingual speakers and relocate the resources to Mandarin, Hindi, etc.

(b) If I understand the metric correctly, it represents knowledge in Wikimedia projects in complete isolation from other knowledge resources that users have access to. I'm afraid that this creates biased results given that the crucial question is how people actually use knowledge from Wikipedia and how they integrate it with other knowledge resources (see also my article here). For example, information in small language projects is often so valuable because it is not readily available anywhere else. The metric does not distinguish between this genuinely novel knowledge and knowledge that is readily available through other means such as a quick Google search.

(c) There are also vexed problems with the very notion of the "sum of all knowledge" in Wikimedia's mission. For example, it seems quite intuitive that missing knowledge on the top left of your diagram (e.g. the name of Paris Hilton's dog) tends to be less important than knowledge in the bottom right (e.g. the name of the current US president). Of course, we could adjust the measure so that the name of the US president covers more space in the diagram than the name of a celebrity's dog. But then we face the problem of having to asses the weigh of different pieces of information across of linguistic and cultural contexts.

All three points seem to contribute to a bias against small language projects. Furthermore, similar problems will pop up in other contexts. For example, problem (b) is equally obvious in the case of long specialized articles in English, German, French, etc. I always found long specialized articles to be especially valuable as far too much information in Wikipedia is the result of a quick Google search and essentially redundant to other information resources. However, the suggested metric is completely blind to that and seems to favor superficial work that transcribes readily available knowledge to Wikipedia over the work of editors who spend a lot of time in libraries and archives to free knowledge that is actually locked away.

Don't get me wrong: I think that the metric is very interesting and helpful in focusing some important issues. However, I do not think that one simple metric can give reliable advise for complex questions of funding etc. Cheers, David Ludwig (talk) 20:03, 27 August 2014 (UTC)[reply]

I think that is less a problem with the metrics used than it is a shortcoming of the stated goal of the Wikipedia movement. If the goal is "a world, in which every single human being can freely share in the sum of all knowledge", it implicitly values knowledge presented in languages with more users, and the op-ed's metrics are largely consistent with the implications of that stated goal. If, however, the goal of the Wikimedia foundation were instead the preservation of culturally important knowledge in all the world's languages, then we would be talking about different metrics that would more fundamentally value contributions in small language projects. But I don't think it helps to conflate the shortcomings of the Wikimedia foundation's vision statement with any shortcomings of the metrics. Van Isaac_WS^cont 03:21, 29 August 2014 (UTC)[reply]

You're right that (a) is mostly about the mission. However, I would disagree with the metric's implicit assumption that our mission is exhausted by the "sum of all knowledge"-idea. For example,the mission statement of WMF starts with the goal "to empower and engage people around the world to collect and develop educational content." Also: Even if we limit ourselves to the "sum of all knowledge"-idea, the problems (b) & (c) remain. (b) It is not very plausible to measure our success in making knowledge available by looking at Wikimedia projects in complete isolation of other knowledge resources that people have access to. (c) Even if our goal is "the sum all knowledge," it seems plausible that we should prioritize knoweldge (e.g. names of presidents vs names of celebrity dogs). David Ludwig (talk) 12:29, 29 August 2014 (UTC)[reply]

Thank you, @David Ludwig:, for your thoughtful comments. I will go through your three points as you give them:

(a) yes, this is correct. As a help to decide which tasks to tackle next, the metric favors to help more people over helping less people. This obviously doesn't mean that small languages should not be supported, because we will never get close to a perfect score if we don't give much more massive support to smaller language communities as well - but if the question is prioritization then the metric suggests that underserved large language communities should get support and attention before underserved small language communities. But there is an (intentional) balancing factor, too: the metric is logarithmic on the y-axis. This means that providing, say, the first 5000 vital articles in a small language will be more valuable than providing articles 500,000-600,000 in a large language. So once the larger underserved languages are improved, the focus will naturally shift to the smaller languages. And it might stay there for a longer time, because smaller language communities have intrinsically a harder time to get started because of the smaller possible pool of contributors. So, yes, your observation is correct for now, but it will self-correct over time. If I had to choose between two projects that would have similar effect on their given language Wikipedias, and both of them are roughly similar, I would take the number of people served by this action into account. So, yes, observation (a) is correct and intended, but it is self-correcting.

(b) this is correct. Unfortunately, I cannot read your article for free, but I think I understand your point nevertheless. It is correct that the metric only considers free knowledge provided through the Wikimedia movement, because this is currently one of the few knowledge sources where the consumers can share in, i.e. become contributors. Maybe this is only my interpretation of the vision, but I regard the possibility to become a contributor in the knowledge that is counted as integral to the vision. If this is the case with other sources too, maybe they should be counted -- or, even better, they could be integrated into the Wikimedia movement. It could be a possibility to try to capture other sources too in an alternative background curve, which would show if a proposed project covers new ground or merely duplicates existing work, but this would have several issues (like the assumed linearization of knowledge, which only works well with a single source I guess). I'd say, if there are external sources of free knowledge, integrate them, and resolve the issue thus. So, yes, observation (b) is correct and intended.

(c) this is partially correct. Adding the name of Hilton's dog to the English Wikipedia is less valuable than adding Netwon's law to the Gujarati Wikipedia. But you are correct that this not a function of the knowledge being added (i.e. Hilton's dog's name vs Newton's law) but rather of the already existing absolute size of the given languages. This metric is incomplete with regards to that, and this is an oversight on my side which results from experience: most of the actual Wikipedia work I have done seems to suggest that the communities in general know how to prioritize, and thus this seems to be self-regulatory. But you are right: it is not. Cebuano or Waray-Waray have an unreasonable amount of articles on topics with no particular high relevance to their communities, but may lack vital articles. This is a result of bots creating articles not based on the importance of their subject but rather the availability of usable raw data. The metric as proposed is blind to that. This should be improved, and the most obvious way to improve this is by weighting knowledge with its relevance. I will add that to the metric discussion on Meta.

Again, thank you for your thoughts! This is one kind of discussion that I wanted to see: highlighting consequences of the proposal, and its shortcomings. --denny vrandečić (talk) 15:44, 29 August 2014 (UTC)[reply]

Hi Denny, thanks for your helpful response! While I agree that the suggested metric captures something relevant, I still think that it is too narrow to guide priorities, funding decisions, and so on. Of course, similar problems will arise with any available metric but maybe we should not aim for one single metric that captures our mission, anyway. Instead, we may come to the conclusion that our diverse goals require a “toolbox” of evaluative heuristics that include equally diverse measures and metrics. Let me illustrate this by returning to 2 of my original points:

(a) You write that “the metric favors to help more people over helping less people.” Sure, but this is not a controversial aspect of your metric and I do not think that we need a metric at all to establish that it is better to help more people :-). The real issues occur when we need to prioritize projects that have different strengths and where different factors point in different directions (e.g. potential audience measured by speakers of a language, potential audience measured by literate speakers of a language with internet access, actual audience measured by page views and so on, importance of the covered topic, uniqueness of the information compared to other available online sources, second-order effects such as building of free knowledge communities, potential for translation of created knowledge, preservation of linguistic knowledge of small languages, preservation of local knowledge such as the cultural heritage of small communities, “freeing knowledge” that is hidden in archives, collections, repositories with paywalls, and so on.) I still think that reliance on your metric would reduce this complexity too much and that we need a rather broad interpretation of “the mission of the Wikimedia Foundation [...] to empower and engage people around the world”. Of course, a lot depends on how we interpret this mission. I want WMF to embrace the diversity of goals and motivations in our movement. I want that the mission of Wikimedia develops at least partly “bottom-up” through the interests of active volunteers and is not entirely directed “top-down” by a rigid metric that can alienate volunteers who are judged to be “not important” (real-life example: “No one needs free knowledge in Esperanto”). I also think that a very narrow interpretation of our mission would hurt Wikimedia because people with “fringe interests” that barely matter in your metric are an essential part of our community. However, you may have a different perspective on some of these issues. It is OK to disagree here but it is important to clarify the premises of potential metrics. And it seems to me that your metric makes pretty substantial assumptions of about what our mission should be.

(b) I think that you underestimate the problem here. Let me illustrate this by sketching an alternative metric. Let us start with the “sum of all knowledge”-idea of your metric but let us consider all knowledge resources that people have access to: Wikipedia, libraries, online search engines, school education, TV education, knowledgeable friends, books at home, and so on. We may visualize this by extending your one-dimensional representation of knowledge through a two-dimensional representation that involves Venn Diagrams. Every circle represents a set of knowledge that is accessible through a specific resource and the intersections of the circles represent the sub-sets of knowledge that is accessible through various resources. In analogy to your metric, the Venn Diagrams and the intersections will vary for every individual (access to computers? access to libraries? what libraries? what school education? and so on). For every individual, this metric would represent (1) the total knowledge that is made accessible through Wikimedia (the full circle, i.e. the same as your metric) (2) the knowledge that is uniquely made accessible through Wikimedia (the parts of the circle that have no intersection with other circles). I can think of many reasons to prefer a metric that considers both (1) and (2) over your metric that only considers (1). It models the educational contribution of Wikimedia much more realistically. It is sensitive to core contributions that “free knowledge” by bringing it from archives or paywalls to public places such as Wikimedia. It recognizes the pioneering work of Wikipedias in indigenous languages that often only have a small body of published knowledge. It recognizes the importance of academic editors who write free knowledge articles on their narrow area of research. And so on. If we completely ignore (2) and only focus on (1), we loose all of that. Honestly, it strikes me as something that may be a good strategy for a company that is only focused on its own product but seems pretty ill-suited for an educational charity that is primarily interested in the positive impact of its work for people.

Thanks again for starting this interesting conversation. David Ludwig (talk) 16:24, 1 September 2014 (UTC)[reply]