Wikipedia:Wikipedia Signpost/2015-05-06/Blog
How many women edit Wikipedia?
This section was published in error and has been removed from circulation. |
The month-long "Inspire" campaign seeking ideas for new initiatives to increase gender diversity on Wikipedia recently concluded successfully, with hundreds of new ideas and over 40 proposals entering consideration for funding. During this campaign, there were a lot of questions about the empirical basis for the statement that women are underrepresented among Wikipedia editors, and in particular about the estimate given in the campaign’s invitation banners (which stated that less than 20% of contributors are female). This blog post gives an overview of the existing research on this question, and also includes new results from the most recent general Wikipedia editor survey.
The Wikimedia Foundation conducted four general user surveys that shed light on this issue, in 2008 (in partnership with academic researchers from UNU-MERIT), 2011 (twice) and 2012. These four large surveys, as well as some others mentioned below, share the same basic approach: Wikipedia editors are shown a survey invitation on the site, and volunteer to follow the link to fill out a web-based survey. This has been a successful and widely used method. But there are some general caveats about the data collected through such voluntary web surveys:
- Percentages cannot be compared, due to different survey populations: The overall percentage among respondents from one survey (e.g. the frequently cited 9% from the December 2011 WMF editor survey, or the 13% from the 2008 WMF/UNU-MERIT survey) is often taken as a rough proxy of “the” gender ratio among Wikipedia contributors overall. But different surveys cover different populations, e.g. because they were not available in the same set of languages, or because the definition of who counts as “editor” varies. This is especially relevant when trying to understand how the gender gap develops over time – e.g. we can’t talk about a "drop" from 13% to 9% between the 2008 and April 2011 surveys, because their populations are not comparable. Also, the slightly higher overall percentage in the 2012 survey, compared to the preceding one (see below) should not be interpreted as a rise. However, comparisons are possible for comparable populations, and in this post we present such trend statements for the first time.
- Participation bias between languages: There is evidence that the participation rates for such surveys vary greatly between editors from different languages. For example, in both the 2008 survey and the 2012 survey, the number of Russian-language participants was much higher than for other languages, compared to the number of active editors in each language.
- Women editors may be less likely to participate in surveys: A 2013 research paper by Benjamin Mako Hill and Aaron Shaw confirmed the longstanding suspicion that female Wikipedians are less likely to participate in such user surveys. They managed to quantify this participation bias in the case of the 2008 UNU-MERIT Wikipedia user survey, correcting the above mentioned 13% to 16%, and arriving at an estimate of 22.7% female editors in the US (more than a quarter higher than among US respondents in that survey). Hence we now know that the percentages given below are likely to be several percent lower than the real female ratio.
- Different definitions of "editor": Most of these surveys have focused on logged-in users, but there are also many people contributing as anonymous (IP) editors without logging into an account. What’s more, many users create accounts without ever editing (for this reason, the 2011/12 editor surveys contained a question on whether the respondent had ever edited Wikipedia, and excluded those who said "no". Without this restriction, female percentages are somewhat higher).
- Because they only reach users who visit the site during the time of the survey, these surveys target active users only. And depending on methodology, users with higher edit frequency (which, as some evidence suggests, are more likely to be male) may be more likely to participate as respondents.
- Sample size: As usual with surveys, the fact that respondents form only part of the surveyed population gives rise to a degree of statistical uncertainty about the measured percentage, which can be quantified in form of a confidence interval.
Still, these caveats do not change the fact that the results from these web-based surveys remain the best data we have on the problem. And the overall conclusion remains intact that Wikipedia’s editing community has a large gender gap.
What follows is a list of past surveys, briefly summarizing the targeted population and stating the percentage of respondents who responded to the question about their gender with female in each. In each case, please refer to the linked documentation for further context and caveats. Keep in mind that the stated percentages have not been corrected for the aforementioned participation bias, i.e. that it is likely that many of them are several percent too low, per Hill’s and Shaw’s result.
General user surveys
- Population: Logged-in Wikipedia users who did not respond "no" to the question "Have you EVER edited Wikipedia?"
- Method: Banners in 17 languages, shown only once per user (October/November 2012)
- 10% female (n=8,716. 11% when including non-editors and users who took the survey on Wikimedia Commons. 14% among Commons users, with n=463)
- Population: Logged-in Wikipedia users who did not respond "no" to the question "Have you EVER edited Wikipedia?"
- Method: Banners in multiple languages, shown only once per user
- 9% female (n=6,503)
- Population: Logged-in Wikipedia users who did not say they had only made 0 edits so far
- Method: Banners in 22 languages, shown only once per user
- 9% female (n=4,930)
- Population: Site visitors who described themselves as "Occasional Contributor" or "Regular Contributor"
- Method: Banners shown to both logged-in and logged-out users, in multiple languages
- 13% female (n=53,888)
Other surveys
There have also been several surveys with a more limited focus, for example:
Global South User Survey (WMF, 2014)
- Population: Site visitors in 11 countries and 16 languages, who selected "Wikipedia" (along other large websites) in response to the question "Which accounts do you most frequently use"?
- Method: Banners shown to both logged-in and logged-out users
- 20% female (n=10,061)
- Note: In this survey, the ratio of female editors was much higher than in the 2011 and 2012 surveys, in those countries where data is available. However, it is plausible that this difference can largely be attributed to different methodologies rather than an actual rise of female participation across the Global South.
Gender micro-survey (WMF, 2013)
- Population: Newly registered users on English Wikipedia
- Method: Overlay prompt immediately after registration
- Draft results: 22% female (n=32,199. 25% when not counting "Prefer not to say" responses)
JASIS paper on anonymity (2012)
- Population: Active editors on English Wikipedia (estimated to number 146,208 users at the time of the survey (2012))
- Method: User talk page messages sent to a random sample of 250 users
- 9% female (n=106)
- Tsikerdekis, M. (2013), The effects of perceived anonymity and anonymity states on conformity and groupthink in online communities: A Wikipedia study. J. Am. Soc. Inf. Sci.. DOI:10.1002/asi.22795 (preprint, corresponding to published version)
Grassroots Survey (Wikimedia Nederland, 2012)
- Population: Members of the Dutch Wikimedia chapter and logged-in users on the Dutch Wikipedia
- Method: Banner on Dutch Wikipedia, and letters mailed to chapter members
- 6% female (n=1,089 (completed))
Wikibooks survey (2009/2010)
- Population: Wikibookians in English and Arabic
- Method: Project mailing list postings and sitenotice banners
- 26% female (of 262 respondents, 88% of which described themselves as contributors)
- Hanna, A. 2014, ‘How to motivate formal students and informal learners to participate in Open Content Educational Resources (OCER)’, International Journal of Research in Open Educational Resources, vol. 1, no. 1, pp. 1-15, PDF
Wikipedia Editor Satisfaction Survey (Wikimedia Deutschland with support from WMF, 2009)
- Population: Logged-in and anonymous editors on German and English Wikipedia
- Method: Centralnotice banner displayed after the user’s first edit on that day, for 15 minutes (all users on dewiki, 1:10 sampled on enwiki)
- 9% female (ca. 2100 respondents – ca. 1600 on dewiki, ca. 500 on enwiki)
- Merz, Manuel (2011): Understanding Editor Satisfaction and Commitment. First impressions of the Wikipedia Editor Satisfaction Survey. Wikimania 2011, Haifa, Israel, 4-7 August 2011 PDF (archived)
"What motivates Wikipedians?" (ca. 2006)
- Population: English Wikipedia editors
- Method: Emailed 370 users listed on the (hand-curated, voluntary, since deleted) "Alphabetical List of Wikipedians", inviting them to fill out a web survey
- 7.3% female (n=151)
- Nov, Oded (2007). “What Motivates Wikipedians?”. Communications of the ACM 50 (11): 60–64. DOI:10.1145/1297797.1297798, also available here
"Wikipedians, and Why They Do It" (University of Würzburg, 2005)
- Population: Contributors to the German Wikipedia
- Method: Survey invitation sent to the German Wikipedia mailing list (Wikide-l) ("The sample characteristics of the present study might be [a] limitation because participants were very involved in Wikipedia … the reported results might not be the same for occasional contributors to Wikipedia.")
- 10% female (n=106)
For further research on these and other questions, see e.g. the “Address the gender gap” FAQ on Meta-wiki, or follow our monthly newsletter about recent academic research on Wikipedia.
- This post has been condensed. The full version is available on the Wikimedia blog.
Discuss this story
Removed from circulation per below. ResMar 14:10, 8 May 2015 (UTC)[reply]
Note: this isn't the actual blog post
I feel honored to see this Signpost issue draw attention to my post, it's a pleasant surprise! However, I think it should be made clearer that this isn't actually the text that was published on the blog. Hidden in the version history of the page is Resident Mario's note that this is a "down-edited version", which at least involved the removal of several sections with important information, possibly also other meaning-changing edits (I haven't checked yet). This has already led to understandably confusion about important missing content, see Andreas' comment below.
The actual text of the blog post can be found here: https://blog.wikimedia.org/2015/04/30/how-many-women-edit-wikipedia/ . I would appreciate it if this Signpost version could a) link directly back to it (this is good practice for syndicated content in general), b) explain that the full version with additional information can be found there, and c) credit ResMar for his edits. The byline should not imply that this is the text that I wrote and published. Regards, Tbayer (WMF) (talk) 09:20, 8 May 2015 (UTC)[reply]
UNU-MERIT sample size missing
Is there a reason the sample size for the UNU-MERIT survey isn't indicated? It was well over 50,000 (or around 60,000, if ex-contributors are included, as they were in the other surveys listed). Could this info be added? --Andreas JN466 07:24, 8 May 2015 (UTC)[reply]
Women editors less active?
The following struck me as odd:
Because they only reach users who visit the site during the time of the survey, these surveys target active users only. And depending on methodology, users with higher edit frequency (which, as some evidence suggests, are more likely to be male) may be more likely to participate as respondents.
It almost seemed to suggest that we should imagine that there are lots of active women editors who just didn't see the survey because they weren't active ... I guess there is a useful point in that though: a gender gap can express itself not just in the numerical difference in male/female editor counts, but also in the numerical difference between male/female edit counts.
If the numbers of male and female editors were the same, but males made 95% of the edits, that would still be an enormous gender gap. Equally so if male contributors edit every day, while female contributors only edit twice a month. Empirical data on this might be useful: if the public wants to understand who edits Wikipedia, it probably makes more sense to count edits rather than editors.
See also the preceding paragraph: many users create accounts without ever editing (for this reason, the 2011/12 editor surveys contained a question on whether the respondent had ever edited Wikipedia, and excluded those who said "no". Without this restriction, female percentages are somewhat higher). This seems to imply then that there are lots of women who register an account but never edit – an interesting fact in itself, but little consolation. --Andreas JN466 07:24, 8 May 2015 (UTC)[reply]
Trend statements
The article says, comparisons are possible for comparable populations, and in this post we present such trend statements for the first time.. I imagined this would look at data for a specific country or language, and describe an apparent development of time. But I've read the article twice now, and I don't see any significant statement on trends in particular populations. Were they left out? --Andreas JN466 07:24, 8 May 2015 (UTC)[reply]
what percentage of edits are by women, instead
All of the studies seem to have been trying to measure percentage of all editors who are women, rather than the percentage of all edits that are done by women. Percentage of wikipedia done by women is what matters, right? Wouldn't measuring the latter be more natural, and more important? And it would be relatively easy to implement in a survey that avoids most/all of the biases of volunteer web-surveys: randomly sample from all edits ever, or from all edits in en.wikipedia during 2014, or whatever other defined universe of edits. Take 1000 or some number. There will be fewer editors than edits, say 650 editors, because prolific editors will have multiple edits in the sample. Seek to determine the gender of each editor in the sample. Present results in terms of fraction of edits by women, with +/- 90 percent certainty. Also present results for fraction of Wikipedia impact by women--i.e. weighting by the size of each edit--also with +/- 90 percent certainty.
This general approach is more costly per datum acquired, but it requires a much smaller sample (than in web-surveying) to achieve results of equivalent or significantly better accuracy. There are standard means to determine sample sizes required to get results of any degree of accuracy. All of this is routine methodology. No doubt Tbayer and others understand all of this. So, why not use this approach? Isn't it important to get to some truth on this, what percentage of Wikipedia is women-added?
Note: it is very important to try very hard to get every editor involved to share their gender information (and truthfully) for the results to be valid. So perhaps giving incentive by paying for participation is needed, and/or setting up procedure so editors can be confident the info will be kept confidential, and it is necessary to try hard to track down all of the editors, including those who are no longer active. Where gender of editors for some edits, nonetheless, cannot be ascertained from routine efforts, some further study of the likely bias there should be done, e.g. by applying extraordinary efforts to get some info from a subsample of those difficult-to-reach editors. All of this, too, is routine methodology. --doncram 11:38, 8 May 2015 (UTC)[reply]
Query: Have any surveys tried to get at the reasons for the gaps?