Wikipedia:Wikipedia Signpost/2008-06-23/Dispatches
Dispatches: How Wikipedia's 1.0 assessment scale has evolved
Two different grading systems: "importance" and "quality"
Most users will have seen the talk page banners that indicate what stage an article has reached in the writing process: {{A-Class}}, {{B-Class}}, {{Start-Class}}, or even {{Stub-Class}}. They may also have noticed that many articles are graded according to their importance: from {{Low-importance}} to {{Top-importance}}. These rankings may seem cryptic to new or occasional editors, and even seasoned editors may not have given much thought to the role of these templates in Wikipedia's quality control process. Moreover, there is often confusion about the relationship between this assessment scale and the processes that determine good articles (GA) and featured articles (FA).
Importance scheme
Wikipedia's importance scheme aims to determine the importance attached to an article's topic by its related WikiProject(s) – from those that are "extremely important, even crucial", to those that are "not particularly notable or significant". Thus, the same topic may be more important to one project than to another, and as such can receive more than one assessment on the importance scale. Powderfinger, for instance, has been rated of "top-importance" (priority) by the Powderfinger WikiProject, "high-importance" by WikiProject Australia, and "mid-importance" by WikiProject Alternative music.
Quality assessment
The encyclopedia's quality assessment scheme is more complex, because it has to address many facets of article quality, such as completeness, layout and language. Since a June 2008 poll added a new "class", WikiProjects will begin using five levels for quality assessment:
- Stub – a basic description in a paragraph or two;
- Start – an article that is developing, but is quite incomplete and lacks reliable sources;
- C – an article that is moderately complete, but lacks sources or contains cleanup tags;
- B – an article that is mostly complete, without POV or other major cleanup issues, but which requires further work to reach Good Article standards;
- A – an article that is organized well and is essentially complete, but needs style issues addressed before submission as a featured article candidate).
Critically, such "importance" and "quality" are not necessarily correlated: one article might be of "low importance" and "A Class" (see Clea Rose example); another might be a "top-importance" stub (see Judiciary of Australia example).
At press time, the new C-Class still needs to be fully enabled in the WP1.0 bot and elsewhere. This new classification has effectively raised the standards of quality required to attain B-Class. Other classes are included, such as FA-Class and GA-Class, which are not WikiProject-based, as are descriptive classes such as "Portal-Class"; for a complete list, see below.
Developing the scale
The original purpose of the assessment processes was twofold: to facilitate the production of an offline release, and to assist WikiProjects in organizing their articles, by categorizing the quality of articles as simply, accurately and comprehensively as possible. A test CD (Version 0.5) was released by the Version 1.0 Editorial Team in 2007, and a larger DVD release (Version 0.7) is planned for the third quarter of 2008. The gargantuan task of sifting through 2.4 million articles (as of June 2008) would be impossible with just a handful of team members. To solve this problem, a standardized baseline had to be developed so the task could be distributed among the editors who comprise Wikipedia's base.
Instead of developing a brand-new scale, the Version 1.0 Editorial Team adopted existing guidelines, and modified them for greater scalability. The assessment scheme in use across the community was originally developed at the Chemicals WikiProject as a method of tracking the completeness of the articles in their Worklist (a set of around 400 articles on which the project decided to focus its effort). By late 2005, the scheme was proposed as part of the article selection process at the 1.0 project. The Work via WikiProjects sub-project was started with the aim of having projects provide subject-expert assessments, which the 1.0 team could then put together to produce a broad selection of articles from the encyclopedia. The initial method was to request manually written lists of the top articles from each project; this did generate around 3,000 assessments and provided some suitable articles, but was very labor-intensive. In April 2006, there were about 1.1 million articles in Wikipedia, so continuing with the older method would have proved ineffective. At about this time, a new category-based, bot-assisted system was introduced; this gave projects valuable tools for their work (lists, a log and a statistics table) and provided the 1.0 group with a much more comprehensive list of articles. Tagging an article (via the talk page) is straightforward, and so the scheme rapidly grew to encompass 30,000 articles by August 2006, and to around 1.3 million articles in June 2008. The following table shows the aggregate of all the assessments by more than 1300 participating WikiProjects and task forces throughout Wikipedia:
|
Although the assessment scheme is only approximate, it allows users to broadly gauge article quality, and WikiProjects to keep track of their articles. When combined with the importance assessment scheme (which is not universally used), projects can see which of their key articles need the most work. The Wikipedia 1.0 project is now able to integrate the information from all of the WikiProjects and make selections of articles for offline release.
Quality | |
---|---|
FA | |
FL | |
A | |
GA | |
B | |
C | |
Start | |
Stub | |
Needed | |
Other classes | |
Future | Current |
List | Redirect |
Disambig | Template |
Category | File |
Portal | NA |
- Note: The chart is generated from WikiProject templates, and represents the scheme used until June 2008. There are currently 6623 featured articles, but some wikiprojects include featured lists in their featured article tally, so the number of featured articles in the chart is overstated. On the other hand, there are currently 40572 good articles, but as some articles have no WikiProject templates or the templates are not updated to include GA, the number of good articles in the chart is understated.
Criticisms and changes
Although the scheme is generally working, there is a steady trickle of criticisms and suggestions. The scheme is designed mainly for WikiProjects to assess article content and completeness, but GA and FA levels are included as "cross-references" to Wikipedia-wide quality assessment processes. This has been a regular source of confusion, since GA and FA status are not awarded by WikiProjects.
The Version 1.0 Editorial Team recently reevaluated the number of levels for project-based quality assessments. Until now there have been four (Stub, Start, B and A), but a recent poll indicated support for expanding this to five. To be useful across the community, the system must be simple and straightforward, so that all editors in all projects can use a common system for assessing articles. A greater number of assessment levels may yield a finer analysis of quality, but this is meaningless if the assessments cannot be performed to this level of detail. A majority of those polled believe that a fifth level (C-Class) will give a more refined scheme without seriously compromising reliability. The C-Class level will be introduced in the coming weeks.
The 1.0 team is testing a bot for automatic selection of articles. This involves evaluating the importance of an article using four parameters: a manual assessment by the project, the number of page hits, the number of foreign language "interwiki" links, and the number of links into the article. These factors are weighed along with the quality assessment to produce a selection of the most important "decent" articles for release. Initial test results look promising, but require an improved balance between WikiProjects. This new method should allow the 1.0 team to easily make regular general releases, and individual WikiProjects should be able to produce their own offline releases on paper, CD or DVD.
Discuss this story
Oddity
Here's a (small) oddity. I happened to find myself at Talk:Henry_Ford and I note that all of the projects rate it B-class, but the version 1.0 team rates it A-class. I'm thinking that's a mistake... --jbmurray (talk • contribs) 21:20, 14 June 2008 (UTC)[reply]
Is someone going to finish the description of the Grading scheme? SandyGeorgia (Talk) 07:31, 15 June 2008 (UTC)[reply]
Is the grading scheme really a common system?
Some people think the A,B,Start,Stub classes are free for the WikiProjects to use or not. Others think that they should be standard and have the same meaning across all projects. Based on the history of the Version 1.0 project, I think the latter interpretation is correct. But the way things are going now, the grading scheme has been co-opted for the projects' own use and the Version 1.0 project became an incidental thing. --seav (talk) 04:47, 17 June 2008 (UTC)[reply]
Poll results
I just glanced for the first time; the poll results appear at a quick glance to be mixed and almost an even split, particularly after factoring in neutrals, so unless I'm missing something, I suggest we adjust this wording to reflect split opinion, and explain why it was split (summarize the pro and cons):
SandyGeorgia (Talk) 17:00, 18 June 2008 (UTC)[reply]
Now ready to publish?
I updated the effects of the C-Class issue as requested (although the goalposts are moving as I type this!). Regarding the examples of Top-Stub and FA-Low, such examples are both rare and hard to find; if you find a well-known Top-importance article like Star Wars/WP:Films, it's unlikely to be a Stub, and a Low-Importance article in any project is not well-known by definition. But I think most people will understand what the Judiciary of Australis is, and that it's important for WP:Australia, and a click on the link will explain more.
Do you think we need to elaborate on closing of the poll? There is a link to the relevant section, but we can copy over some of that section into the Dispatch if you think it's needed. My only concern is that superficial coverage of a long/complex debate may invite drive-by criticisms from those who weren't involved, and at this point we are committed to the change anyway. (I spent around 12 hours studying every comment and weighing the factors before I declared the final decision.) What do you think?
Is it ready for publication now? Walkerma (talk) 19:15, 21 June 2008 (UTC)[reply]