Wikipedia talk:Flagged revisions/Trial/2

Sinebot

Sinebot made a strange edit to this page. As a result two my comments (and several votes, which were later restored) evaporated. Ruslik (talk) 22:21, 2 January 2009 (UTC)[reply]

Should probably contact slakr. §hep • ¡Talk to me! 22:57, 2 January 2009 (UTC)[reply]

Sunset provision

The enabling of any Flagged-revisions functionality (not the individual trials, but the entire schema) should be done only with a clear Sunset provision, which this proposal does not include. This proposal should be rejected on that basis alone. (sdsds - talk) 05:00, 3 January 2009 (UTC)[reply]

The only imperative statement in the whole proposal is that "each trial must have a definite endpoint". This is an extremely clear sunset provision for each trial, as you note. However, without an active trial, there is no "Flagged-revisions functionality"; if there are no active trials, the extension is invisible. Happy‑melon 10:50, 3 January 2009 (UTC)[reply]

On a technicalitly, there is no statement in the 'Future options' section of when the consensus for keeping FR will be reviewed. The proposal can be seen as allowing the beginning of new trials with specific endpoints indefinitely until one finishes with the right result. MickMacNee (talk) 15:29, 3 January 2009 (UTC)[reply]

What is the "right" result? Surely if we can ever agree that a trial has produced the "right" result then the trial period is over? I doubt the community's patience would stretch to an indefinite set of trials, and I trust our bureaucrats to judge the community's mood correctly. If support for continued trials begins to wane, then bureaucrats will be less amenable to beginning them, leading to the "invisible" situation. The sunset provision in this proposal is effectively that the community's willingness to continue is being reassessed at the start of every trial. Happy‑melon 17:55, 3 January 2009 (UTC)[reply]

The way to see whether this has produced the right result is to have a control sample which is not flagged, and which is otherwise comparable. Septentrionalis PMAnderson 18:41, 3 January 2009 (UTC)[reply]

Definitely, such ideas should definitely be a part of any viable trial. Happy‑melon 20:18, 3 January 2009 (UTC)[reply]

I would have said that measuring how much support has waned, given the multitude of different ways a trial can be proposed, is going to be difficult. Why not just say, "if after X months no specific trial config has made it into a roll out discussion, the feature will be removed"? Considering trials are in the order of months, this could be a very long process. MickMacNee (talk) 19:58, 3 January 2009 (UTC)[reply]

Doing so doesn't really mean anything, it's just an olive branch to those who are opposed to FlaggedRevisions in any form, since if support wanes the implementation goes into hibernation anyway. That's not to say it's necessarily a bad idea, of course. It would be rather difficult, however, to agree on how long X months should be, especially since as you say this is likely to be a very long process. I would be surprised if the trial phase was over before the end of this year. I think I have enough faith in our bureaucrats to trust them to make the right calls on this issue. Happy‑melon 20:24, 3 January 2009 (UTC)[reply]

Wikipedia -- The Encyclopedia that invites you to have faith in bureaucrats! Perhaps that model of governance is a big piece of the philosophical framework that allows some people to support flagged revisions.... (sdsds - talk) 21:13, 9 January 2009 (UTC)[reply]

Yes, that sounds about right. Why, is there something we should know about the ability of our bureaucrats to judge community consensus? Happy‑melon 21:44, 9 January 2009 (UTC)[reply]

Please indicate which wording in Wikipedia:Bureaucrats supports the assertion that a task like this is appropriate for bureaucrats. Taken to the extreme, bureaucrats could decide every word that apppears on each Wikipedia article page. We don't do that, though, because authoritarianism isn't the model of governance which has attracted so many editors to Wikipedia. (sdsds - talk) 05:21, 10 January 2009 (UTC)[reply]

Who determines the parameters of trials?

If we go forward with trials, who will determine the parameters of these trials? The proposal says that "A trial begins when there is a consensus on the pages, metrics and procedural details involved. Each trial must have a definite endpoint; either a fixed time duration or some other objective quantity." Are there going to be separate community-wide discussions about each and every trial or will the trials be put into the hands of a smaller, more manageable group held accountable to the larger community?

Until those questions are answered, it seems that the current proposal is entirely too vague. --ElKevbo (talk) 06:16, 3 January 2009 (UTC)[reply]

Each trial will require a separate consensus on which articles to include, how and when the various technical bits are assigned (when to sight, who to make reviewers, etc), when the endpoint is, and (most importantly) whether to go at all. As the bureaucrats are the ones with the technical ability to create 'surveyors', who are the only ones who can actually configure FlaggedRevs on individual articles, judging that consensus is in the hands of the 'crats; if we don't trust them to judge consensus correctly, we have much greater problems than a FlaggedRevs trial :D. While I expect the trial discussions won't attract as wide a community involvement as this discussion, they'll still be open to everyone to contribute. Happy‑melon 10:42, 3 January 2009 (UTC)[reply]

Could you at least amend the proposal to make clear that specficic proposed trials that can be proposed, as the four currently standing, do not have to be two months long, and do not have to require admins to allow reviewer rights. There are a lot of things that people clearly believe will happen in this process if implementation for trials is approved, but aren't mentioned anywhere on the page. The status of new pages for example as mentioned above in the opposes as well. MickMacNee (talk) 15:23, 3 January 2009 (UTC)[reply]

Amending this proposal while it's in a straw poll stage is asking for a riot, but the /Proposed trials page is entirely fair game (indeed I reverted just this morning an attempt to 'enshrine' those two variables there). Why don't you put something down there yourself? It's not 'my' poll :D... Happy‑melon 17:51, 3 January 2009 (UTC)[reply]

That is why WP:VIE strongly recommends against taking a poll until the proposal has been actively discussed, and then only to document an apparent consensus on the result. I have included a list of conditions which might persuade me and some others to support next time; but making proposals which cannot be changed shows a failure to understand that we edit by consensus. Septentrionalis PMAnderson 19:46, 3 January 2009 (UTC)[reply]

You've been asleep for about a month if you think this proposal hasn't been actively discussed, both with those active on WT:FLR and the wider community. Did you miss the RfC? Happy‑melon 20:17, 3 January 2009 (UTC)[reply]

No, I've just missed that obscure page; I will watch it hereafter. Yes, like most editors, I missed the RfC too; please supply a link. (This is why proposed changes to the whole of WP should go on watchlist notice.) Septentrionalis PMAnderson 21:34, 3 January 2009 (UTC)[reply]

If I understand you correctly, it sounds as if each step and parameter of each trial would require another round of community-wide consensus gathering. That, to me, seems unnecessarily bureaucratic and cumbersome. Would it not be more practical for these details to be worked out by a smaller and more manageable group of people? Whatever is done, it still seems that the proposal needs to be fleshed out and made more clear.

In any case, I appreciate your patience! --ElKevbo (talk) 17:57, 3 January 2009 (UTC)[reply]

I wouldn't say we need a separate community-wide consensus for each and every trivial detail. In the same way this proposal has evolved, we'd start with a framework, develop the details, invite community input and refine based on those comments, and then conclude with a consensus-demonstrating poll. That's the timeline of a good proposal. Doing so for each proposed trial separately is important for a number of reasons. Firstly, it acts as a check-and-balance to keep our feet on the ground and make sure that what we're doing has and continues to have community consensus. Secondly, it means we can have more than one parallel proposal in the 'pipeline', without which we'd be trying to shove square pegs into round holes to squash all the myriad possible variations on FlaggedRevisions into one implementation. Finally, we can separate that decision-making process from the two most controversial questions: "who arbitrates said decision-making process?" and "should we allow that process to proceed at all?". This proposal is really only to answer those questions: to say that the 'crats, rather than the devs, should judge our consensus, and that yes, we do want to carry this forward at least one more step. I agree that the individual proposals need considerable "fleshing out"; but that's another job for another day, once we've decided whether to let them continue to grow or slaughter them now :D. Happy‑melon 18:12, 3 January 2009 (UTC)[reply]

"Evolved"? How? Septentrionalis PMAnderson 19:46, 3 January 2009 (UTC)[reply]

If you read the archives, particularly some of the threads at the top of WT:FLR, you can see how this proposal grew out of two individual users' personal configuration ideas; the idea that we should conduct a trial grew organically from that discussion and pretty much snowballed from there. There was never a "Hey guys why don't we do a trial, we could run it exactly like this..." AFAIK. Happy‑melon 20:15, 3 January 2009 (UTC)[reply]

On the contrary, that's exactly what this page is; it leaves out many of the important specifications on what a meaningful trial should be, but includes exactly what keys and triggers should be used. Septentrionalis PMAnderson 21:10, 3 January 2009 (UTC)[reply]

I can see how it could appear to be such for someone who managed to miss all the discussion beforehand, and the first they heard of it was the announcement on various pumps. However, to conclude from that that the item in question must have been created instantaneously by some Designer is rather similar to certain other theories of questionable scientific legitimacy. The edits speak for themselves: this proposal grew from simple starting points into a more complex entity, pretty much the definition of evolution :D. Now we're keen to take everyone's input on how best to progress it further. Happy‑melon 21:40, 3 January 2009 (UTC)[reply]

I've just added some outline trial proposals. In light of the consensus dring a poll issue, if they aren't appropriate on that page, and need to come here, I would not object. Or should I mark them as late additions? MickMacNee (talk) 19:50, 3 January 2009 (UTC)[reply]

I'm glad to see some part of this that can be edited. Some of the discussion above implies that the proposals are not what we are currently being polled on, so modifying them should be fine. Septentrionalis PMAnderson 20:10, 3 January 2009 (UTC)[reply]

These are very interesting points; I think in particular a trial on Unwatched articles will be an absolute necessity. The /Proposed trials page was split out precisely to allow that page to evolve during this period of high interest, so as Pmanderson says, feel free to make whatever edits there you think are helpful. Happy‑melon 20:13, 3 January 2009 (UTC)[reply]

statistics please

Before this trial begins how about posting some stats on article edits, the numbers/percentage of vandalism edits that are true vandalism. How significant is this problem, is this going to address them or create little cabal worlds? Gnan garra 09:39, 3 January 2009 (UTC)[reply]

(Using the rule that 80% of stats are made up ...) looking at a small subset of about two hundred articles on my watchlist, about 15% of the articles have had vandal edits which I have seen (actually about 27 out of 180+) in a two month period. On those pages, about 3 to 5% of the edits are by vandals or very clueless newbies. It would be interesting to see if other editors have found similar ratios, or whether my group of "more contentious than average" articles represents worst-case. Extrapolating - in a test of 10,000 articles, I would expect 1,500 to be attacked within a given period of two months. Collect (talk) 23:51, 3 January 2009 (UTC)[reply]

I have done some research into this this afternoon. Working from the top of the recentchanges list, limiting to the mainspace and excluding log events, I have gathered the following statistics using a short python script, which I'm happy to post if you want it. From the most recent one million revisions:

Roughly a quarter of all edits are from IPs (238100 (24%))
The percentage of all edits that are obvious reversions (rollbacks or undos with things like "vandal", "spam" or "rvv" in the summary) is steady at around 4% (41,425/1,000,000)
Of those obvious reversions, roughly two thirds are reversions of edits by IPs (25770 (62%))
Of the individual users reverted, 75% of them are IPs. (17,941/24,211 (74%))

This does not really answer the most important question, which is "what percentage of edits by X are vandalism?", and note also that the method used to identify reversions will systematically underestimate the total number of reversions (and hence the total amount of vandalism). I do not believe, however, that there is systemic error in the last two figures, so I would be prepared to claim that "Three quarters of 'casual vandals' are IPs" and that "Two thirds of 'obvious' vandalism is from IP editors". Any suggestions for improving the depth and accuracy of these results? Happy‑melon 16:29, 4 January 2009 (UTC)[reply]

In short (and considering one million edits to be a statistically valid sample) almost 10% of IP edits are non-utile (giving them the benefit of the doubt). And about 2% of non-IP user edits are non-utile edits. This is, moreover, similar for IPs to what the German report claimed. Unfortunately we do not have a breakdown of regstered users by number of edits, and we must also be aware that some cases may be legitimate edit disputes as well (calling an edit "vandalism" does not automatically mean it was vandalism). Basically this means this experiment may not be the way to go (much as I wish it were). We would gain quite nearly the same positive result by making sighting only required for IP edits in the first place (basically a variant of "semi-protection") as we would be implementing flagging of all articles. Collect (talk) 16:47, 4 January 2009 (UTC)[reply]

We could test that. If we set up one trial with a bot to run round sighting all non-IP edits to stable versions (we'd have to give the bot 'reviewer' as well as 'bot' rights, because it can't normally do that), we'd be simulating exactly the system you propose. That could be an interesting implementation of FlaggedRevs (a full implementation would give autoreview to all registered (autoconfirmed?) users), certainly worth further consideration. Happy‑melon 17:09, 4 January 2009 (UTC)[reply]

Your result that 4% of edits are vandalism contrasts with a figure of about 10% that was being bandied about in Flagged Revs discussions a few months ago. As you point out somewhere else, WP editing has weekly and seasonal cycles. I suspect that the level of vandalism is much lower than average during school holidays for instance! PaddyLeahy (talk) 10:16, 5 January 2009 (UTC)[reply]

Note also that the count excludes bot edits, ie reversions by our anti-vandal bots; as I said, the percentage of vandalism is a systemic underestimate. Your point about school holidays is interesting though: I'm repeating the analysis taking one million edits from the bottom of the recentchanges table, which is December 6 onwards; at least some of which will be during school term time. Happy‑melon 10:34, 5 January 2009 (UTC)[reply]

Stats from the bottom of the recentchanges table are very much the same as those from the top:

How many revisions to analyse? 1,000,000
Total edits:   1,000,000
IP edits:        274,756 (27%)
Total reverts:    69,615
IP reverts:       45,191  (65%)
Total vandals:    39,309
IP vandals:       29,412  (75%)

I'll try and get the bot edits included. Happy‑melon 11:12, 5 January 2009 (UTC)[reply]

While you're doing that, do you think that you could get an estimate of how long articles spend in a vandalized state? Comparing timestamps between edits and reverts should give the start of an estimate. These stats are very useful, thanks! Lot 49a^talk 17:23, 5 January 2009 (UTC)[reply]

Also, it looks like 7.0% of edits in the older sample are vandalism. So PaddyLeahy might have a point. Lot 49a^talk 00:26, 6 January 2009 (UTC)[reply]

Likely so -- which is a tad more in line with my figures above. Key though is the apparent ratio of IP vandals which is quite nearly the same in each case. Collect (talk) 15:38, 7 January 2009 (UTC)[reply]

There are some links to studies over here. The result all seem pretty preliminary, but the main take home I get from them is that ~3% of page views of BLPs are vandalized views ~7% of article minutes of highly visited pages are in a vandalized state and that currently the median time to fixing is about 6-14 minutes (the mean time is much, much higher due to some vandalism that was missed for a very long time in a few cases).

At this point, given how hard a time we seem to be having coming up with good statistics or information on this problem, I feel like we ought to be spending more time gathering that evidence to make a rock-solid case for how serious a problem this really is.

I'm also open to hearing a compelling argument for whether having ~3% of page views of BLPs be vandalized page views is a compelling enough problem that we need to overhaul a significant portion of our editing and article approval processes. edit My tone is obviously sceptical but I am genuinely open to changing my mind on this. Lot 49a^talk 09:10, 8 January 2009 (UTC)[reply]

Here is another study. Lot 49a^talk 09:18, 8 January 2009 (UTC)[reply]

These are interesting figures and its lower then I thought, I expected to see around 35-40% of edits being vandalism related(edit/rvt) given the wider affect it'll have. To see results in the 3-7% range means that 93-97% of edits are not an issue, it appears to me that maybe we are making mountains in the sky over this. If there was a subset of articles which were attracting a higher rate I'd assess that but I definitely dont see any need based on these. An additional concern would be that complacency will creep into article watching making it harder to address the problems beyond edit/revert stage. As already said maybe a better effort in obtaining stats/measures would be beneficial before considering any trial period. Gnan garra 09:48, 8 January 2009 (UTC)[reply]

The more I read into the stats (and I agree that I think we need more studies) the more it seems like the biggest problem is the stuff that sneaks past the watch tower that is RC patrol etc. and then sticks around for a very long time because no one cares or is watching these pages. The median time for reverts seems really good, it's the nutty outliers where vandalism is unfixed for days that need to be brought down. These make the mean unacceptably high (though in aggregate, we're still talking only ~3% of page views showing bad edits live) It's not clear to me how FR solves this problem. I suppose it means that these unfixed vandalism changes would remain unsighted for all that time, but that brings up the backlog problem, the solution to which has been 'bots would auto sight posts after a certain amount of time" which means that the undetected-in-the-first-place vandals would still not be stopped? Lot 49a^talk 10:32, 8 January 2009 (UTC)[reply]

Is there a list of unwatched articles somewhere? If so, just advertising that fact might reduce the list. I wouldn't be averse to adding a few to my watchlist, and I'd expect that there are many others in the same position. - Hordaland (talk) 15:42, 9 January 2009 (UTC)[reply]

Apparently there is but only Admins can see it (probably to prevent vandals from picking easy targets). I'd sign up for a service where I got some articles assigned to me to watch :) Lot 49a^talk 22:56, 10 January 2009 (UTC)[reply]

User:Dragons flight/Log analysis may be of interest: analysis of the situation up to October 2007, at which point 10% of all edits were reverted (20% of IP edits); in general the problem was growing with time up to then but now we have reached a bit of a steady state (slowly declining edit rate) maybe vandalism has stopped growing or is even declining as well. PaddyLeahy (talk) 21:17, 10 January 2009 (UTC)[reply]

Lots of nice material -- at the time, then, 7% of edits were reverted and made by IPs. About 3% were reverted and made by registered users. Total IP edits have fallen, and the percentage of them which are reverted has fallen, though not as greatly as the drop in reversions of edits by registered users. Thus, vandalism is apparently reduced, especially from its peak, but there is still a substantial problem left. The issue boils down to -- is flagging the optimal way to proceed? Collect (talk) 23:12, 10 January 2009 (UTC)[reply]

But these stats obviously don't include all the vandlaism that would be occurring if lots of pages were not semi-protected? Or am I missing something. They only show the current reversion rate for edits to non protected articles, which misses all the attempted vandalism of the most popular but consequently protected targets. Mfield (talk) 23:22, 10 January 2009 (UTC)[reply]

As per Gnangarra, I would also expect some solid statistics laid down before such noise were made. That is, nobody is here exactly knows the extent of the problem (vandalism) but we're still voting for a trial (aiming to solve that problem) to be implemented or not. How would the results of this trial be interpreted then, be compared against what? It seems to me that some users would like to have some fun time with this trial even not knowing what's going on. I suspect german and russian wikis have had any real improvement after the implementation of that flag thing.Logos5557 (talk) 00:09, 11 January 2009 (UTC)[reply]

The German Wikipedia couldn't find any statistical improvements after 8 months of flagging, so discussion is ongoing to increase the requirements for Sighting de:Wikipedia_Diskussion:Gesichtete_Versionen#Ergebnis_Teil_1_-_Kriterien_f.C3.BCr_die_Sichtung_.28Vorschlag.29, which will increase the current backlog further. Mion (talk) 00:25, 11 January 2009 (UTC)[reply]

Mion, could you point us to any statistics on WP:DE regarding improvements?--Jo (talk) 09:21, 15 January 2009 (UTC)[reply]

Yes I can point to statistics at Wikipedia:Flagged_revisions#Statistics, and de:Wikipedia_Diskussion:Gesichtete_Versionen#ParaDox.27s_Tabellen_und_Diagramme, (with thanks to all who are working hard to provide them). More stats are expected. Mion (talk) 10:04, 15 January 2009 (UTC)[reply]

Statistics

Available statistics are at:

http://stats.wikimedia.org/DE/ (updated September 2008)
User:Hut 8.5/German editing stats (2009)
User:Hut 8.5/DEWP reviewer stats (5 may 2008- 12 dec 2008)
http://s23.org/wikistats/wikipedias_html.php (2009)
http://vs.aka-online.de/cgi-bin/rchiststat.pl?dur=86400&lim=30&wp=DE&hl=&fl= (live)
http://toolserver.org/~aka/cgi-bin/reviewcnt.cgi?lang=english&action=images&project=dewiki (live)
http://toolserver.org/~aka/cgi-bin/reviewcnt.cgi?lang=english&action=outofdatereviews&project=dewiki (live)
http://toolserver.org/~aka/cgi-bin/reviewcnt.cgi?lang=english&action=overview&project=dewiki (live)
de:Spezial:Statistik (live)

No page will ever change unless sighted / "seconded"

There is probably a lot I don't follow regarding the mechanics, but one thing seems clear, no page will ever change unless sighted.

In other words, were all articles on FR, and were no-one to ever flag anything, Wiki would never change, for better as well as for worse. Under FR, change only happens when manual flagging occurs. That is a lot of work, but it is much less than doubling the normal activity we now have. Even were every future edit to be viewed and either passed or declined, that would be less than double work because reading and deciding take less time than sourcing and composing.

However, there is still a lot of work. That means a few people doing a lot or, on average, people sighting as much as they contribute. I guarded support the proposed trial, though I would recommend on purely mathematical grounds that even IPs should be allowed to sight articles. Quality revisions do not depend on sighters, who may know little about a topic, but on the contributors at a page. The advantage of sighting is mainly at low traffic pages where solo operating experimenters have been saying for some time, in their own unique, way that we should adopt a "display only if seconded" policy.

I have a feeling that mathematics will force us to be generous, or we will indeed only drastically slow genuine improvement and growth of the encyclopedia in our attempt to exclude vandalism. It's worth remembering that more sources are published every day than current levels of volunteers could ever keep pace with. "Seconding" edits is the perfect counter to solo experimenters; however, authorising selected volunteers to render judgments outside their areas of expertise will only lead to burn-out of those volunteers, contributors they clash with, and the encyclopedia itself.

Viewed positively, people's "sighting counts" will be a very important contribution to the encyclopedia, and get us all working more closely together.

"If it were done ... 'twere well it were done quickly."

I support FR so long as IPs are authorised as sighters. Alastair Haines (talk) 14:18, 3 January 2009 (UTC)[reply]

Uh, doesn't that defeat the purpose of FR? If were limiting which registered users can sight articles, how does opening it up to IPs help the situation? IPs have no stable or permanent edit record, so even if an IP applys for sighting permission, how do we determine that the next person to edit under that IP is the same person who was given the permission? How do we stop vandals and trolls from using permitted IPs to sight their own edits or the non-productive edits of others? I don't see how FR can ever work unless we know exactly how is doing the sighting, and that they can be held accountable for what they do with the sighting priveleges. - BillCJ (talk) 15:07, 3 January 2009 (UTC)[reply]

Btw, I don't need to know anything about an article's subject to know that "This article is KEWL!!!" has nothing to do with nuclear physics or whatever other subject. The main point of FR is to keep the crap out of public view. - BillCJ (talk) 15:11, 3 January 2009 (UTC)[reply]

It is a point that everybody needs to bear in mind, under the FR system, there is no-one to "blame" for a perfectly good edit not being sighted, it is just an inherent feature of the system, much like there is no-one to blame if nobody investigates an SSP, or responds to an ANI or other noticeboard post. There is as far as I can see, no specific commitment to even investigate the effects of this basic change in the editting process in the proposal. MickMacNee (talk) 15:17, 3 January 2009 (UTC)[reply]

Thanks guys, two good points.

Yes Bill, no expertise is needed to reject "This article is KEWL!!!" which is why IPs can do it. On the other hand, I've been so long out of mathematics, I wouldn't trust myself to know if some "corrections" to the Laplace transform article were cheeky experiments or not. If we are to have "qualified sighters" that's fine, so long as they really are qualified; and that can't be determined within the current Wiki structure. User:BobTheJazzMan sights a correction to the Binary numeral system#Binary arithmetic to ensure it says "1+1=0" and rejects it as vandalism. How long will it take to persuade Bob, who is a trusted Wikipedian with 50,000 edits to music articles, that he's actually making Wiki maths articles less reliable?

I also take Mick's point, we are all to blame if contributors give up donating effort to Wiki because no one gets around to sighting their new Pokemon related article under the strain of having to prioritise where they go sighting, or because they're not a qualified Pokemon expert, or because they think they are.

I'm in favour of FR if it is a way we all take non-expert interest in one another's work, or support the work of others working in similar areas to ourselves. If it's going to introduce a group of "experts" into the system, that's a new thing altogether, and ultimately contributors will want to see verifiable independent, third-party backing for such claims of expertise, and constraints on taking action outside that reliably sourced expertise. That's OK with me, but if I were a maths editor confronted by BobTheJazzMan telling me what's right, because User:WikiRulesOK said he could be trusted, I'd have a good giggle and contribute my future donations to online knowledge via another web-site.

To conclude, I think FR is a brilliant step forward, if it forces us to take a closer interest in one another's work and "second" it for our co-workers. However, if we set up some system where some editors are supposed to be more expert than others, that's another thing altogether. Sighters will regularly be less expert than the contributors whose edits they are reviewing. Perhaps someone can come up with a dozen examples better than my "1+1=0" one. Alastair Haines (talk) 17:39, 3 January 2009 (UTC)[reply]

The only way FlaggedRevisions can work logistically is if the group of 'reviewer's make up a measurable fraction of the active user population. De.wiki has over five thousand sighters; we have three times as many articles as them, so we're looking at a reviewer base of at least ten thousand editors. No one is entirely sure how many active editors there are in total on en.wiki, but that must be at least 10% of the total, and including every one of the top quartile by activity. I don't believe that the formation of a 'cabal' in such a situation is really possible.

As for what will happen in such a situation, which I agree is highly plausible, I think the important thing to realise is that there is more to wiki editing than just sighting revisions. We have to actually make revisions as well :D. If User:Foo reverts a 1+1=0 edit and sights it, that situation is entirely analogous to User:Foo stumbling across such a phrase in an article today and 'fixing' it. When User:Bar, a more "expert" mathematician, sees the change, he'll have a quiet laugh, restore the change with an explanatory edit summary, and either sight the change himself or wait for a sighter to see the change (and the explanation) and sight it for him. The only change from the way it currently works is the "sighting" part; the history would be exactly the same, and it would appear very similarly in people's watchlists. Happy‑melon 17:48, 3 January 2009 (UTC)[reply]

Thank you Happy-melon, which is a really great name, but I won't ask more about it. You do calm my anxiety on key points. Most importantly, a very large number of sighter/reviewers is intended. I note the proposal suggests all who currently have rollback access, which I believe I have without ever having asked for it. And, incidently, I would never have asked for had a request been necessary. Likewise, even now, I would accept the responsibility of sighting if it was given to me, but I would not ask for it if it wasn't...

except for one thing:

you speak of User:Bar "sight[ing] the change himself". This is the odd thing, it seems that effectively the idea is that anyone can apply to sight her own edits, if she's not considered to be a potential risk of vandalism she'll be accepted. Presumably, if she breaches trust, she could lose this access; however, the main thing is, FR seems simply to be about a way of implementing a kind of "probabationary" feature for accounts. That doesn't quite match the figure of 10% of active editors being sighters, and were I a regular but not high volume contributor not granted the option of sighting my own edits, where others were, I might feel irritated that my good faith contributions have not already been noted and new hoops are being imposed on me.

Anyway, I don't mean to be critical of the FR idea, nor of your comments Happy-melon. I think we agree on most matters of principle and although it might seem there's a gap between your 10% and my 100% (sighting can be lost, but is otherwise automatic) I think that gap can be closed by what we learn from trials. You are doing a great job of helping talk nervous people like me and others through this trial process. That's precisely how a community should work, good for you! :) Alastair Haines (talk) 18:30, 3 January 2009 (UTC)[reply]

Thanks for your comments. I think a lot of people misjudge the amount of security we can legitimately expect from FlaggedRevisions, either expecting it to be a panacea to all our ills or to be of no use whatsoever. Of course it's not going to stop all disruptive edits, no technical measure can. What it can do is stem the tide and remove the bulk of the crap that we get deluged with every hour of every day, leaving us humans time to deal with the problems that can't be handled by such processes - and of course, the actual job of writing the encyclopedia :D. I don't see it as a "probation" so much as an expression of good faith: "you've clearly shown some good edits, and no obvious problems, so there's no reason not to give this to you". Obviously if that tool is abused, that indicates that our initial assumption of good faith was incorrect, and so it should be removed. The ultimate position should in fact be the same: all legitimate accounts have the flag, and it has been granted and then removed from all the 'bad' accounts. Of course logistically we're never going to get to that stage, but the principle is indeed the same. Happy‑melon 20:09, 3 January 2009 (UTC)[reply]

I do mean to be critical of the FR idea; I began by thinking this an ill-thought out proposal for testing it, but this discussion has convinced me that the original idea was almost as bad.

The example here makes clear what is wrong with it, besides sheer impracticability: suppose User:Foo with his good-faith fix, is an admin, and sights his own edit; there is no requirement that admins know or understand mathematics. Then we have something worse than the present situation, for User:Bar's fix may well have to wait three weeks or longer to actually take effect on the visible side of WP. Septentrionalis PMAnderson 20:24, 3 January 2009 (UTC)[reply]

Gosh, it would be nice to have some evidence for how long they'd have to wait that wasn't imported from a wiki with a radically different culture and mentality to ours... :D You raise a good point, and if our median sight time is 21 days it will indeed represent a significant problem. We really do need to confirm for ourselves that we can keep that backlog down. There's only one way to find out: to test and see if we can handle a small set of articles. If we can, we can try a larger set, and a larger set, until we are confident that we have either found our limit, or that we can handle an entire namespace. Some number between zero and infinity is the number of articles that we can successfully sight without building up an enormous backlog. No one here has the foggiest idea what that number is, we have no evidence whatsoever that is really applicable here. Why don't we collect some? Happy‑melon 21:45, 3 January 2009 (UTC)[reply]

This proposal admits that this test will not scale. The German Wikipedia as a whole is smaller than we are; tests can only prove that backlogs will exist, they can't disprove it. Septentrionalis PMAnderson 22:30, 3 January 2009 (UTC)[reply]

The proposal does not scale technically: the difficulty of maintaning a trial (ie setting which pages should and should not display FLR behavior) increases with the size of the sample. Of course the difficulty in maintaining the pages themselves also scales with the size of the trial, but that's what we're trying to find out. If we decide to have a trial on 6,914,554 articles, we can do that, it will just be a pig to initiate. The data gathered from such a test would be exactly the same as if we had enabled FLR over the entire mainspace. The point is that this configuration is not a sustainable solution for having large numbers of pages displaying FLR behavior. But then again, that's the whole point. Happy‑melon 16:13, 4 January 2009 (UTC)[reply]

No, the data from such a trial would be utterly useless, because nobody would ever be able to examine it. (The secondary consideration that there would be no control to compare it to is relatively minor.) Septentrionalis PMAnderson 16:56, 4 January 2009 (UTC)[reply]

Oh I quite agree, such a trial would be utterly useless and would have my most strident opposition. My point is that the data gained would be just the same (and equally useless) as if it were gathered from a full deployment. The proposal allows trials of any shape and size we want, with the understanding that larger trials are more difficult to organise. The only restriction is, as you correctly note, our ability to analyse the resulting data, with larger implementations being more difficult to draw accurate conclusions from. As evinced admirably by the mishmash of soundbites and statistics from de.wiki. Happy‑melon 17:06, 4 January 2009 (UTC)[reply]

I think autoreviewing as discussed in the section Autoreview delay below would be an effective way of preventing backlogs, at the cost of letting through a small proportion of vandalism; in this system all revisions that have stood as the current revision for a certain time period (say 24 hours) are automatically flagged as reviewed. The vast majority of vandalism is still detected, and no good edits have to wait longer than a predictable, reasonable period to get through. Dcoetzee 02:07, 6 January 2009 (UTC)[reply]

Questions for somebody with great patience . . .

I gotta tell you, as a technodolt, I'm just barely able to grasp what is being proposed here. I went to the Wikilab thing and made an edit, looked at the history, and now have some questions. Okay, so if the page viewed by the public does not change until it has been "sighted", what happens when multiple editors attempt to edit that page ere anyone takes time to "sight" it? Will the second editor see the page as edited by the previous editor, or the public page? I assume the former, and if that's the case, when I go to "sight" it, will it show me the edits individually or altogether? If there were six edits made since the last sighting, all by different anons, and edits #2 and #5 were vandalism, will I be able to dismiss those and leave the others? Doesn't all this lead inevitably to at least some likelihood of increased edit conflicts?

As an editor who has recently (and reluctantly) come to the conclusion that perhaps we should require registration to edit _{(it feels to me like 90% of anon edits are vandalism, and 90% of vandalism is from anons)}, I probably should jump on this proposal, right? But like someone in the discussion up there mentioned, one of the things that may motivate an editor in the early going of their Wikipedia career—whether they are anon or registered— is getting to see their edits right away. Are we going to be turning off a substantial portion of our potential editorship with this? Finally, can this process be applied exclusively to anon edits? That is, could it be set so that all edits from registered accounts—even those just created—would appear right away?

Just some questions from someone like everyone else here who just wants the best for the project. Un school 17:07, 3 January 2009 (UTC)[reply]

Firstly, I could just as easily imagine someone getting a kick from seeing his/her edits immediately, as from seeing that another human being has agreed that those edits improve the Free Encyclopaedia.

Secondly, have a look at a page that has not yet been checked. The 'edit this page'-link at the top of the page will presumably read 'edit draft', and will allow you to edit the unchecked version, even showing you what (unchecked) edits were made since the last review. Have a look, experiment a bit. I think this should answer several of your questions. -- Ec5618 17:32, 3 January 2009 (UTC)

These are good questions. The first is very simple: the edit box always contains the latest version of the page wikitext, so there is never any conflict there; editors are editing the current version, whether or not they saw it on the 'view' screen. If there are differences, they are shown in a diff above the edit box, so editors will always know that they've either seen the latest version, or what the changes between the version they saw and the version they're editing are. This would allow them to see, for instance, if a change they wanted to make has already been made. If you're able to sight the page, then after you've made your edit you'll be invited to review the latest changes. A diff will show you all the changes made on top of the most recent sighted revision. If some of them are vandalism and others not, you may need to make another edit to revert only those vandal edits, but you'd need to do that with or without FLR (and with the system, you have a diff at the top of the screen to show you exactly what you're looking for). Once you're in a position where you're happy that the diff between the last sighted version and the current revision contains only positive changes, you can click a button to sight it (and by implication, all the edits before it). You don't have to sight each edit individually; they're sighted versions, not sighted edits.

We're all aware that the 'buzz' an IP gets from seeing their edits straight up in blinking lights is one of the things that is going to be lost with FlaggedRevisions. However, remember that that buzz is not always a good thing: I'm sure a large proportion of vandal edits are motivated by exactly the same instant gratification. Although obviously I can't put any hard evidence to it (yet :D), I hope that the two additional low hurdles (creating an account to see current versions, and getting some sort of reputation to get the 'reviewer' flag) will actually encourage people to make the transition from being the IP who makes the once-in-a-blue-moon spelling correction, to the registered user who makes the occasional gnome edit, to the fully-fledged editor who is an active part of the wikipedia community. I think our encouragement of existing editors is quite good; our biggest chokepoint is in getting IPs to register in the first place. I hope that this will actually help to get them on the map.

On a technical level, yes, we could give all registered users the ability to sight revisions. However, I don't believe that this would be beneficial. IIRC the numbers you quote at 90% are actually closer to 60%; certainly there are a staggering number of vandal edits that do not come from IPs. Sockpuppeteers, hardcore vandals and even casual vandals who use accounts purely to hide their IP addresses, all demonstrate that it is not possible to say with any certainty that all, or even an acceptable majority, of registered users are trustworthy. I would be very hesitant even to put my chips down to say that an acceptable majority of autoconfirmed users are trustworthy. In my opinion, this level of 'trust', which is really very low, is the lowest that requires the human touch; a review (however brief) by a real person who we've already determined to have coherent judgement. Certainly 'reviewer' should be granted very liberally indeed, to almost everyone who asks for it. But I think that having to make that step of submitting a request is important, for two reasons. Firstly, it encourages editor development, as noted above; making a foray 'backstage' is an open door to innumerable 'force multiplier' tools and features that will make them a more durable (less likely to drift away) and effective editor. Secondly, it weeds out the casual vandals and blatant undesirables (spammers and SPAs in particular). All in all, I strongly believe that a lightweight human review for 'reviewer' is an essential component of a successful FlaggedRevisions implementation.

I hope this answers your questions, or at least gives my perspectives on them. Feel free to ask if anything needs further clarification. Happy‑melon 17:39, 3 January 2009 (UTC)[reply]

Thanks for your answers, melon. There was a single point which I failed to make clear...I wasn't suggesting that all registered accounts be able to sight edits, I was suggesting that all edits by registered editors not need "sighting". But I can see from your other comments (about people registering to perform vandalism) that you would probably not favor this. One last question that I hadn't thought of before. If I am a "reviewer", are my edits automatically "sighted"? Un school 18:15, 3 January 2009 (UTC)[reply]

If you make an edit to a sighted version (ie there are no changes between the time the last version was sighted and the time you edit) then the version after your edit is sighted automatically. If you make an edit on top of other unsighted revisions, the final version is not automatically sighted. This makes sense since it's versions that are sighted, not edits; if the only change between a new version and a sighted version is an edit by a trusted user, it makes sense to assume that the new version is clean. However, if there are other changes involved, it wouldn't be appropriate to automatically sight the new revision. Happy‑melon 18:19, 3 January 2009 (UTC)[reply]

Just one question... "Who's on first?" Seriously... I could not make heads or tails of your last answer. So let's try again... 1) If I am a "reviewer" are my edits automatically "sited" 2) if not, do I have to get someone else to site them or can I site them myself? Blueboar (talk) 22:01, 5 January 2009 (UTC)[reply]

Let's say User:123.123.123.123 edits foo. Their changes are obviously not autoreviewed; so after their edits the page has "unreviewed changes". Now let User:Rollbacker come and make an edit. They have made changes to an unsighted version, so their edits are not marked as "sighted": if they want the version after their edits to be sighted (and hence display for anons) they have to "sight" it manually (there is an screen for them to do this immediatley after their edit is processed). Once they've done that, the page is 'up to date', and has no unreviewed changes. Now if User:Reviewer comes along and makes an edit, all the changes that are made have been made by an explicitly 'trusted' user, so User:Reviewer's edits in this case are automatically sighted; no one has to manually mark them as sighted. Anyone who can sight edits can sight all edits, including their own; indeed there are subtle encouragements in the software for people to do this as good 'wiki cleanliness'. I hope this clarifies. Happy‑melon 23:10, 5 January 2009 (UTC)[reply]

Question regarding consensus

I'm strongly opposed to this idea, as are many, many others (not to mention, I'm sure, valid and productive IP editors). What are you guys in support going to take as consensus to push this forwards? At the time of writing we're on 93 in support and 38 in oppose, so that's roughly 71% support. In RfA that's considered by most to be the lowest possible percentage to promote an administrator. In RfB, that's not enough. I doubt it'd be anywhere near reasonable enough for ArbCom either. So, what are you guys going to say is the benchmark for you to take this forwards? I'd imagine it should be a lot higher than 70% support. —Cyclonenim (talk · contribs · email) 19:16, 3 January 2009 (UTC)[reply]

In this case, it's not up to us do decide. The developers are responsible for judging this particular consensus. That they are neither selected nor best placed to do so is a reason why we need control of this process in the hands of the bureaucrats. Happy‑melon 20:01, 3 January 2009 (UTC)[reply]

The very idea of placing this proposal up for a poll before discussing the implementation, and then declaring it unamendable, is a violation of the Wiki way, in providing no avenue to reach consensus; it should be dropped now. Fortunately, developers are unlikely to take notice of anything we do here. Septentrionalis PMAnderson 20:06, 3 January 2009 (UTC)[reply]

The point of this straw poll is to see whether there is consensus for taking the idea further, turning on the capability to have a trial and hashing out detailed proposal(s) for trials. As Happy-melon says, if subsequently there is no consensus on any detailed trial proposal, no trial will happen. This straw poll is not going to lead directly to any final decision being taken. Even if we have a trial, and even if there is consensus that the trial was a success, consensus could still change and we could still decide to turn it off again.—greenrd (talk) 10:28, 4 January 2009 (UTC)[reply]

I very strongly object to any tests under this protocol. (I have stated my reasons above.) I therefore object to the protocol in itself, and want it turned off now and permanently; I could support tests under other conditions. Others object to any tests of FR at all, because no results would get them to support such a change to Wikipedia. Both of us are entitled to object in toto now, rather than having to catch and object to each test proposal. Septentrionalis PMAnderson 16:51, 4 January 2009 (UTC)[reply]

This is my point precisely, and the whole reason to conduct the process in this fashion: it is important that we see whether there is consensus for, and for people like yourself to have the opportunity to comment on, the principle of conducting any trials at all before it makes sense to finalise any particular trial in detail.

What "other conditions" would you apply to the trial architecture? Happy‑melon 19:01, 4 January 2009 (UTC)[reply]

Wikipedia:Flagged_revisions/Trial/Proposed_trials#Conditions would be a good start. Not all of those are my ideas, but most of them are good ideas. Septentrionalis PMAnderson 22:19, 4 January 2009 (UTC)[reply]

I can't agree with the presented numbers, in sep 2008, registered editors (with more than 20 edits) made 13971 edits stats anons made 6058 edits ( 30,24 %) stats, we can safely assume that no anon editor would vote in favor of this proposal, so 185 sup + 102 opp makes 287 votes *.30 = +86 anon opposing votes, in effect the proposal has just as many opponents as proponents .

Taking into account that after the implementation on the German Wikipedia one of every 5 editors left the project raises the question, is 49 % enough ?Mion (talk) 03:28, 5 January 2009 (UTC)[reply]

The above arguments for requiring exceptionally high super-majority to represent consensus seem to be, a) I don't like it, b) All anons would vote like I would so we need to make it harder, c) Full implementation on the German Wikipedia resulted in lots of people leaving.

These seem fallible to me. a) "I don't like it" is obviously not a suitable reason. b) How anons would vote is speculation. Anons are simply editors without accounts. A large proportion of them could support it. c) This is more an argument against the idea, not why we need an exceptional super majority. So it devolves to "I don't like it" again.

I should note also that this is not an irreversible change, or even fundamental change at the moment. It's a trial of functionality, that may either succeed or not once it's tested out. Perhaps when it's time to poll on if it should be rolled out across the wiki, if that ever happens, it might require an exceptional supermajority, but not now. I think a simple majority would probably be good enough for a trial implementation. --Barberio (talk) 03:43, 5 January 2009 (UTC)[reply]

For the first A, is not about, I don't like it, the proposal is a change of rule 2. Ability of anyone to edit articles without registering , a change to make visible edits unvisible. One of the five core rules of the foundation. See m:Foundation_issues

For B, i suggest we do roll playing, I do the roll as Sighter and you logout so you become an anon editor, you make an edit and i make the decision about your valuable contribution. Mion (talk) 04:03, 5 January 2009 (UTC)[reply]

As I said, these are arguments against the use of Flagged revisions, not for a stronger super-majority requirement for a limited trial of flagged revisions.

You've had a full chance to convince people that your opinion is correct on it's merits. That people still seem to disagree is not a cause to require a greater super-majority considering it's only a limited trial. --Barberio (talk) 04:08, 5 January 2009 (UTC)[reply]

At the moment, this has less than 65% approval. Does anyone contend that this is a sign of WP:Consensus, even in our peculiar sense? Septentrionalis PMAnderson 06:52, 5 January 2009 (UTC)[reply]

For a trial, yes. --Barberio (talk) 07:01, 5 January 2009 (UTC)[reply]

Incidentally, comparing this to RfA votes is misleading, as this vote has a much larger population involved. The difference between 100 and 200 votes is more people than the difference between 10 and 30 votes. --Barberio (talk) 07:07, 5 January 2009 (UTC)[reply]

Yes, indeed, it is misleading. 65% of a sample of 300 is very likely to be within a percent or so of a sample of all Wikipedians; 4-2 or even 20-10 are much more dubious. But WP:CONSENSUS defines consensus as a compromise everyone can agree on; there is none such here. Even if all those who would support some form of testing were won over, that would only be 70% or so; 30% dispute this absolutely, and many of the supports are weak, depending on the fraudulent promise that FR will produce reliability and the erroneous impression that the delays on the German WP are minimal.

WP:Consensus also says Articles go through many iterations of consensus to achieve a neutral and readable product. If other editors do not immediately accept your ideas, think of a reasonable change that might integrate your ideas with others and make an edit, or discuss those ideas. That applies even more strongly to a change like this to Wikipedia's structure; the thing to do now is to discuss what changes to this proposal might make it broadly acceptable. It is possible that #just a thought... below, might be one such, but I cannot speak for those who find this unconditionally unacceptable. Septentrionalis PMAnderson 16:07, 5 January 2009 (UTC)[reply]

I think you're making a common miss-reading of WP:Consensus, that a single person or a minority, can filibuster any change by preventing an assumption of general consensus. The policy also includes the phrases, "Consensus can only work among reasonable editors who make a good faith effort to work together in a civil manner.", "Consensus is a partnership between interested parties working positively for a common goal.", and "A representative group may make a decision on behalf of the community as a whole.". Filibuster like attempts to set the bar artificially high, or require complete unanimity, are not good-faith efforts to work together. Consensus is not a set of handcuffs that tie the rest of the group to someone who wants to walk off a cliff. --Barberio (talk) 16:43, 5 January 2009 (UTC)[reply]

The correct interpretation of "consensus" is that the people with the bigger sticks can say that consensus exists even when others say it clearly does not. "Consensus" is routinely abused on Wikipedia to force through the wishes of cliques who have sufficient control of Wikipedia's power structures to do so. To push through this change, in light of the arguments presented in the straw poll, would be a clear abuse of process. However, as our soi-disant constitutional monarch has said he wants it, and has said elsewhere that he is prepared to impose policy on BLP issues, debate seems pointless. DuncanHill (talk) 16:49, 5 January 2009 (UTC)[reply]

Your point, Barberio, if I understand it correctly is, that consensus is like a majorite vote and if the majority is in favor, everyone opposing is a bad-faith editor whose point of view should be discarded? I am sorry, but I always understood that consensus is a question of who has the better arguments, not of what more people like. I'd prefer if some people were to judge consensus here who are not involved here and have no preference...if such people exist. Regards So Why 17:06, 5 January 2009 (UTC)[reply]

Everyone always believes that their own view is the better argument. So if we were to judge consensus solely on if people were presenting differing arguments they felt were valid, then one person could prevent anything happening by saying "I disagree because the space lizards say to."

Consensus is about trying to get consensus agreement by allowing for discussions amongst people working together in good faith. This does mean that a minority of dissent may occasionally have to be ignored if they can not convince the majority that they have valid arguments. The wikipedia processes do normally allow for 'well the basic majority want X, but the minority have compelling arguments not addressed by the majority' to be used to prevent finding of consensus. But not 'well an overwhelming majority want X, and a minority have presented arguments addressed and rejected'.--Barberio (talk) 17:21, 5 January 2009 (UTC)[reply]

If "overwhelming" means "slim" (and that with a debate which excludes most of those likely to be most adversely affected by the trial) then there is an overwhelming majority to implement this trial. If, however, "overwhelming" means what most people use it to mean, then there isn't. DuncanHill (talk) 17:54, 5 January 2009 (UTC)[reply]

I just want to point to a relevant past precedent : Wikipedia:Non-administrator_rollback/Poll. It was fully enabled with 304 for and 151 against. Some opposes made apocalyptic predictions much like now. After the poll was closed as implement, some filed a RFAR. Still we have non-administrative rollback now, and Wikipedia still exists. Ruslik (talk) 10:37, 5 January 2009 (UTC)[reply]

Unless the poll results change drastically,implementing this on these results would be a disruptive nuisance; anyone who presumes to do so should, at a minimum, be stripped of the tools they have abused. Septentrionalis PMAnderson 16:07, 5 January 2009 (UTC)[reply]

It appears that this trial will be implemented at the current count. You may, if you wish, take the issue to arbitration. --Barberio (talk) 16:43, 5 January 2009 (UTC)[reply]

Oh, there are so many other stops for disruptive and unsuitable admins before ArbCom; Barberio should feel free to explain his view here. For now, it is the minority even in this section. Septentrionalis PMAnderson 17:51, 5 January 2009 (UTC)[reply]

Barberio, there's no reasonable way this can be interpreted as a consensus. We need more discussion about implementation, not games of chicken where we challenge people to go to the ArbCom. JoshuaZ (talk) 20:09, 5 January 2009 (UTC)[reply]

I feel it is perfectly reasonable to oppose this 'trial' on the basis that one would oppose the full implementation of the flagged revisions. discounting opposes on that basis seems like poor form. Protonk (talk) 20:01, 5 January 2009 (UTC)[reply]

Non-admin rollback was not a change to the fundamental principles of the Foundation, but rather a technical matter. In addition, for NAR, a developer called a consensus, but this was disagreed with by dozens of editors, even those who supported it. Comparing the two is a false analogy. NuclearWarfare ^{contact me}_{My work} 20:04, 5 January 2009 (UTC)[reply]

Discussions of site-wide technical changes like these are always difficult to handle because they cross the least well-defined territory in the balance of power within wikimedia: between the Foundation and the wiki communities. We are a wiki community, which has complicated, sometimes arcane, and outwardly inconsistent means of judging consensus amongst huge groups with vastly differing opinions. The Foundation is a much smaller organisation, with a hierarchy that is both more clearly-defined, and more linear. This technical change, if implemented, will be made by Wikimedia developers who are employed and contracted to the Foundation; they are both entirely obligated to do as the Foundation decides, and entirely immune from action by the community. How, exactly, would we "strip of the tools they have abused" someone with shell access to the database? These are people who could write this entire thread out of the revision tables and make it appear to have never happened.

The Foundation is totally dependent on its communities to create the free content that is its mandate; and yet the communities are totally dependent on the Foundation to organise and finance the environments and structures that 'house' them. The Foundation has to balance the desires of its communities with how well those desires support its constitutional goals, and the communities have to balance what's best for them with what's best for the readers. In this context, the thought that it is possible to put an absolute number on the percentage of support required is even more ludicrous than in most of our other poll situations.

This poll is, I'm sure, attracting attention far beyond the en.wiki community. I would be very surprised if its 'result' was judged solely by a single developer. I would not be surprised if the final outcome is not what anyone expects; in my opinion, it really is completely out in no-man's land. But my point is that this really is not a vote; it's not even a situation where we can apply our customary hodge-podge of vote counting and strength-judging. We could ask a bureaucrat to close it. We could ask all twelve active bureaucrats to present a collaborative analysis. We could ask our ArbCom to adjudicate. But when push comes to shove, it's really not our decision. We're making a request, effectively, for divine intervention, how are we supposed to dictate whether that intervention is granted? Happy‑melon 23:33, 5 January 2009 (UTC)[reply]

A hundred and more of us don't want divine intervention, thanks. Septentrionalis PMAnderson 23:54, 5 January 2009 (UTC)[reply]

I'm well aware of that, thank you. Well over two hundred of us do. Whose voice is 'louder' in the ears of the developers? I don't know. Some people (on both sides) have put forward some truly awful arguments which evidence little more than their inability to do basic research... will the developers weed out those voices and give them less weight as a bureaucrat would do? I don't know. Will certain voices be given more weight than others as a result of their familiarity to the Foundation (ArbCom, Checkuser/Oversight, etc)? I don't know. My point is not that we should be in any way discouraged from making both our opinions and our arguments heard; merely that trying to second-guess the result is doomed to failure. Happy‑melon 11:15, 6 January 2009 (UTC)[reply]

Above in this thread, Barberio wrote: "I should note also that this is not an irreversible change..." That's not the impression I have from other pages on this topic. I'm sure I read that once this gets implemented, it may or may not stay dormant, but that it will never be un-implemented. To me, that suggests many hundreds of hours spent mobilizing for & against dozens of proposals for more-or-less well-thought-out trials. A huge time-waster, in other words. - Hordaland (talk) 22:42, 7 January 2009 (UTC)[reply]

To suggest that it can never be "un-implemented" is simply not correct; like everything else on wikipedia, consensus can change and if there is a consensus to remove the extension, the developers will do so. However, such a consensus would need to be very strong; the "other pages" suggest rather that this would be quite difficult to achieve. However, with this proposal, the job of determining whether to allow another trial is not being made by developers each time, but by our local bureaucrats, who are much better at judging the desires of this community - that's why they were appointed. They can perform a more selective analysis of the actual strength of the arguments for and against, rather than a more-or-less straight vote count. So it will not be necessary to "mobilise" in the same way as would be required for a dev request; merely to clearly and coherently put forward the arguments for and against, in a widely-advertised discussion. I think the thought that there will be "dozens" of trials, or even trial ideas that become fully-fledged proposals, is misleading. Happy‑melon 12:21, 8 January 2009 (UTC)[reply]

On the contrary, requiring consensus against to turn it off is to declare that it will never actually be turned off. There isn't consensus against it now; there never will be, because some people will always see this as a magic pony which would solve all our problems, if we only wish hard enough. If there were a provision that lack of consensus to continue was enough to turn it off - as the German Wikipedia is clearly divided on it- then I would be more willing to experiment. But that's another point for the next draft. Septentrionalis PMAnderson 23:15, 8 January 2009 (UTC)[reply]

The distinction between "turning it off" as in completely uninstalling the extension and burning the extra database tables, and "not using it", is purely semantic; neither will prevent the 'magic pony brigade' from wanting to call in the cavalry, and neither will satisfy those who would only be placated if every copy of the source code was hunted down and destroyed. For the reasonable people in the middle, however, having this configuration installed but there being no consensus to do anything with it, is no different to not having it in the first place. Since each trial requires a positive consensus (and a sunset provision), to suggest anything other than that use of this extension requires continuing positive consensus, is disingenous. Happy‑melon 14:36, 9 January 2009 (UTC)[reply]

OK, so now I'm not one of "the reasonable people in the middle," thanks. I do not agree that the diff between uninstalling (or not installing in the first place) and not using is "purely semantic." If we don't install, we avoid long discussions like this one and get back to writing the encyclopedia. If we do install, there'll again be as many opinions as there are Wikipedians about which trial, how to do it, controlls, tests, evaluation etc. And people will be called unreasonable, as I have been here, and likely give up the discussions. Some will give up Wikipedia altogether.

Let the Germans fight it out for another couple of years. There is no hurry. The offer isn't going to go away. - Hordaland (talk) 06:20, 10 January 2009 (UTC)[reply]

Reversion

This edit is simply unacceptable. Some of us, clearly, believe that defining the necessary conditions for any future test to be the only point to continuing to discuss this page and the possible tools, instead of unconditionally opposing any future implementation of Flagged Revisions at all.

If Happy melon wishes to suppress discussion, it would simplify matters if he would state his chosen venue of WP:Dispute resolution now. Septentrionalis PMAnderson 21:07, 3 January 2009 (UTC)[reply]

This edit should also be mentioned, a few seconds before the one Pmanderson notes. My reasoning is given in the edit summaries; of course I will revert if there is consensus to do so. Happy‑melon 21:49, 3 January 2009 (UTC)[reply]

I agree with PMAnderson that it is important to raise these issues, and I agree with Happy-melon that Wikipedia:Flagged_revisions/Trial/Proposed trials is a better choice of venue. As a compromise, I suggest providing a suitable link or links from this front page to the subpage. Geometry guy 23:04, 3 January 2009 (UTC)[reply]

Huh?

I am probably as aware as most editors, but I have no idea what "Flagged revisions" are, and all of this talk and all of these references make very little sense to a very busy person. Okay . . . what IS a flagged revision? Yours in puzzlement (and please, no putdowns). GeorgeLouis (talk) 05:16, 4 January 2009 (UTC)[reply]

A 'flagged revision' is an edit made by someone who has not yet established themselves as a sincere contributor to Wikipedia. Flagged revisions are saved but not published until someone who has established themselves as a sincere editor reviews the edit and verifies that it is not vandalism. The primary purpose of flagged revisions is to eliminate vandalism, which usually comes from either unregistered users or registered users who have made very few edits. The German Wikipedia implemented an initiative like this several months ago. – SJL 06:46, 4 January 2009 (UTC)[reply]

The first two sentences are completely wrong: flagged revisions are not the unverified versions, but the verified ones. See m:FlaggedRevs. Geometry guy 13:48, 4 January 2009 (UTC)[reply]

Don't be a dick. I misunderstood and reversed the attribution, but the rest of my explanation is accurate. I was trying to be helpful, after quite a while away from Wikipedia, and you just reminded me why I stopped contributing regularly in the first place. – SJL 18:14, 4 January 2009 (UTC)[reply]

A sharper mind and a thicker skin might help. It is kind of important when discussing a trial of flagged revisions that the people voting on it understand what they are voting for, no? Many, apparently, do not. Geometry guy 10:52, 6 January 2009 (UTC)[reply]

Oh, pooh! Quit arguing. ("Can't we all get along?") Thank you, SJL. Sincerely, GeorgeLouis (talk) 08:14, 9 January 2009 (UTC)[reply]

Can trial also experiment with what IP users see?

As many have noted, it's hard to judge the impact of a change without experiment. So would it be possible to experiment with different appearances to the IP user? Here is one example I could imagine - there are many others:

In one case, display the latest revision, with an optional button to view the last sighted version (and perhaps labelled latest revision with seal of approval or something similar, since an IP user most likely will not know what a sighted version is.) And perhaps a note that tells them that if they wish, they can create a login and make this the default, since they would not know that either.
In the other case, display the latest sighted version as in the current proposal.

This could be done across different pools of pages, or sequentially, or randomly, or maybe diliberately set the English wiki up with a different policy from the German one, so the results can be compared.

This could allow us to understand some user behavior that is just speculation now. What percentage of IP editors are discouraged by having their edits not appear right away? What percentage of the time do users wish to see the different (latest, sighted) versions? How many people become registerd, and the first/only thing they do is set the sighted/unsighted preference? Of these users, how many become editors?

If we don't try some experiments like this, we could fall into the traditional computer science trap. A lot of high tech stuff has horrible user interfaces, since developers are selected from those who do not mind mastering arcane interfaces. It's just like the professor who knows the material well, but is a rotten teacher since they can no longer even imagine what it was like to not know the material, much less sympathize with a poor student. Likewise, here we are arguing about the effect on IP users, but since we are all registered users we are not at all typical newbies, and hence our opinions are suspect. So rather than argue about the effect on IP users it would be much better to measure it, if possible. LouScheffer (talk) 06:28, 4 January 2009 (UTC)[reply]

Expanding on your thought a bit. I would recommend that the first test involve only special test pages which could be used to establish the format for the basic messages that would be used in the following live tests. I would hate to have new users have to deal with something that experienced users have not even had the chance to review and test. I remain very concerned as to just what the IP users are going to see and just how they are going to have to interface with it. I know that we can change it, but the current default is totally unacceptable that I would not want to see it used in any live test. Dbiel ^(Talk) 20:20, 9 January 2009 (UTC)[reply]

Limitations

I would be happy with flagged revisions if

Revisions are accepted automatically after a period of time from last edit (say use mean time to vandalism reversion *constant, where the constant is between 1 and 3). I believe I read a paper on wiki which quoted the mean time to reversion. Can't quite remember where though. I may hunt it down later.
Articles are, by default, not flagged. Flagging an article should be considered to be almost as bad as a semi-protect -- these are quite similar.
A large userbase (say all autoconfirmed users, or some other level of trust) are able to sight revisions

Some features that would be useful

A filtering of the article history that is shows only draft & sighted edits. This would make diffing articles that have heavy vandalism a bit easier.

My 2 cents User A1 (talk) 07:14, 4 January 2009 (UTC)[reply]

Seconded. Imo very concise simple practical considerations.

Your #1 stops the FR system from making all change dependent on manual sighting—things'll tick over w/out us, phew!

Your #2 allows us to be sensibly selective about applying FR where it's needed, and spare ourselves unnecessary work elsewhere.

Your #3 seems to protect "anyone can edit", or is simply a practical necessity should FR end up being more draconian than your #1 and #2.

Your #4 sounds practical too, but though I'd be content that a filter defaults to "sighted"+"draft", I'd like to have the option to "see all", including who drafted alleged vandalism, and who deemed it to be so.

I'd be v. interested in both PMAnderson's and Happy-Melon's views on these suggestions. Perhaps they could agree on very specific things like this. :) Alastair Haines (talk) 07:50, 4 January 2009 (UTC)[reply]

Number 1 is currently technically impossible, since FR do not have this feature. However 1 is actually not very important for the limited trials proposed.
Number 2 is already in proposal. In the proposed implementation articles are not flagged unless surveyors manually enable flagging over them on page by page basis.
I agree with number 3. Reviewer permission should be liberally distributed. Actually Flagged Revisions can be configured so that anybody with a certain amount of edits (other conditions can be attached too) is automatically assigned to the reviewer group. However we decided against including this into the trial proposal, because of the lack of agreement about criteria. In future automatic assignment can be implemented, of course. Ruslik (talk) 08:27, 4 January 2009 (UTC)[reply]
As to filtering, if my recollections are accurate, it is already a part of FR (you can check this on en. labs).

Ruslik (talk) 08:27, 4 January 2009 (UTC)[reply]

As a side note #1 could technically be enforced by bots, writing a bot to do this could be done with nasty scraping scripts or with more clever use of the wiki-API (may need modification). But one would hope that such a bot would be located close to the wikimedia network to keep bandwidth down. Of course having it as part of the FR system would probably be most efficient. 121.44.50.78 (talk) 11:00, 4 January 2009 (UTC)[reply]

This seems a reasonable kernel for the next attempt at this. Septentrionalis PMAnderson 17:07, 4 January 2009 (UTC)[reply]

Report from Polish Wikipedia

As many of you know, we've been using flagged editions on pl-wiki for a while. Initially I was very skeptical about it. However, I changed my mind after a couple of times a vandalized edition was simply not visible to common folks - in my view now it really does increase the quality of output to the outer world. However, I have also noted a serious risk: quite often an established editor might edit a part of an article and accidentally add validity to some nonsense. I would not over exaggerate this risk though: after all to some extent this is what we have now. I disagree with some of the editors above that flagging should be "as bad as a semi-protect". I think that unflagging should be just a little proof that an edition is confirmed by an established editor, and thus should be quite widespread, while flagging last editions from IPs and new editors is only a consequence of this approach. Pundit|utter 07:53, 4 January 2009 (UTC)[reply]

Please explain review procedure in pl-wiki: how deep the scrutiny goes? NVO (talk) 08:01, 4 January 2009 (UTC)[reply]
You mean, in everyday practice? My wild guess would be that it does not go very deep. Some editors will certainly make a big deal of "certifying" some revision, while others would perfunctorily correct a typo without caring about simultaneous confirmation of the article's version. My point is, the bottom line is what we have now (an average Joe come to Wiki and perceives whatever is in there as "confirmed", I don't think everyday users see all the nuances, many do not even realize there is a history button on the page). Thus, flagging will serve a good purpose: occasional Wikipedia users will be more likely to see a sensible revision, rather than rubbish. I don't think flagging may resolve all our problems, but it adds some value. On the other hand I do realize it may be a bit confusing for new editors, whose edits will not be immediately visible to everyone. Pundit|utter 09:12, 4 January 2009 (UTC)[reply]

A few questions...

I'm struggling to find a page that explains this idea without techno-babble (no, I don't script antivirus software and design firewalls, so you may as well explain it in Welsh) so I have a few questions-

If an IP makes an edit to a page, can they then see it, or will they have to wait for someone to review it?
If an IP clicks "edit this page", do they see something in the edit box different to the article if another IP has made edits to it since a revision was approved?
1. If not, do they edit the approved version, even if there are other edits waiting to be approved?
  1. If so, must reviewers choose the best possible "new" history, discarding the rest?
  2. What if the same IP edits a page several times, as is all too common? Can a reviewer only choose to save a single one of their edits?
  3. If multiple saved up changes can all be approved, what if they conflict with each other?
  4. Are we not going to lose attribution as reviewers choose to slip changes in under their own name, as they can't approve both changes made to a page?
Are recent change patrollers (I'm thinking particularly of the 12 years old, never wrote an article brigade) really going to continue recent change patrolling when it turns from the romantic notion of zapping vandalism, hunting down trolls and getting them blocked to the positively thrilling art of clicking to "approve" edits?
Does this not effictively mean that we are treating IP editors as vandals by default, meaning that they're guilty until declared innocent?

I'm assuming I've missed something significant here, as I refuse to believe the Flagged Revisions I understand could possibly be supported by anyone who had any respect for Wikipedia. J Milburn (talk) 12:30, 4 January 2009 (UTC)[reply]

You can try many of these things out in the live demonstration. An important point to realise is that we're not talking about verifying edits, but versions. The system works more or less as it does now, you see. When IP users visit a page, they see the stable version of the article. That is new. When they go to edit the page, they edit the latest 'draft' version, which includes all edits made since the 'stable' version was flagged. Indeed, the link 'edit this page' reads 'edit draft', to accentuate the difference, and the edit window shows some text to notify users that they are editing a 'newer' version than the one they saw. On top of that, the changes between the 'stable' version and the current draft version are shown, so that the editor can decide whether those changes are as much of an improvement as the new edits the user wants to make.

But the basic process isn't much different as it is now. Editing a page that has been modified saves a new version that includes both edits. Editing a draft saves a new draft that includes both edits.

Please have a look at the live demonstration. It will answer many of your questions. -- Ec5618 13:46, 4 January 2009 (UTC)

I'll have a go at answering your questions. If an IP edits a page, they will see a notice at the top of the edit screen saying (by default) "edits will be incoporated into the stable version when an authorised user reviews them". Once they've saved their edit they will see the article as it was before their changes, with a short explanatory banner including a link to the current version of the page. So the most recent version is only a click away. Whenever anyone edits a page, the contents of the edit window are the most recent version; if this is different to the version the user saw on the edit screen, a diff is provided to highlight any changes. So whenever a user with sighting ability goes to edit or sight a page, they see a diff of all changes since the last sighted version. It is slightly confusing to think of sighting individual edits, what is really sighted is the version of the page in between edits. So It doesn't matter if those changes were introduced in one edit or a hundred; what matters is the changes those edits cumulatively make. Really that's no different to the way things work currently, except that the process is made a little more streamlined and easy to keep track of.

I obviously can't speak for all or even any RC patrollers, but I can say with certainty that this extension isn't going to stop all vandalism. If used correctly, it can make vandalism invisible to the outside world, which is really just as good, but RC patrol is not going to change overnight. I don't know the answer to your question, but I want to find out. That's why we should test it to see if it works.

Obviously a lot of the 'philosophical' issues surrounding FlaggedRevisions are dependent on your personal point of view; but I think that many people overemphasise the extent to which FlaggedRevs represents a change in 'faith assumption'. We already treat IPs as second-class citizens, limiting their abilities, treating their edits with automatic suspicion. Whenever we see an IP contributing to a discussion we suspect a sockpuppet; whenever we see an IP edit at the top of our watchlists, we suspect vandalism. There's no avoiding the issue: I've just parsed the last ten thousand recent changes, found 1,706 rollbacks of which 1,189 are of IPs; of the 385 distinct users reverted, 292 of them were IPs. More data is (as always) needed, but the initial conclusion is that around 70% of vandal edits and 75% of vandals are IPs. With those two thoughts in mind, what we're 'doing' to IP editors with FlaggedRevs actually begins to look rather mild. Are we treating them with assumption of guilt? Yes, I suppose so. Do we do that already? Most definitely. In my perspective, what's new here is a way to approve IP edits, to remove that assumption of suspicion. That's new, that's not been available before; previously all IP edits were equally suspicious. It's a matter of perspective.

You've piqued my interest in edit statistics now, so I think I'm going to go write a script to analyse recentchanges more thoroughly, to try and get some more accurate numbers to play with. That's what this proposal is all about, after all. Happy‑melon 14:20, 4 January 2009 (UTC)[reply]

"Once they've saved their edit they will see the article as it was before their changes". This is not true, at least not on the demo install. After an IP user has saved their edit, they will see the most recent version of the article with the results of the edit visible. This shows to them that nothing went wrong and confirms the "anyone can edit" slogan. Only when refreshing the page or coming back to it are they shown the last sighted version. --94.192.121.203 (talk) 14:43, 4 January 2009 (UTC)[reply]

That's interesting to know - I never edit from my IP address as an absolute rule because it identifies me very accurately indeed, so I was making my best assumptions from looking at the source code. I presume that there is a note to identify the version as the current version rather than the sighted version? If so, I see that as a rather elegant solution to, as you say, confirm the "anyone can edit" philosophy and to allow them to check that everything went ok. Happy‑melon 16:08, 4 January 2009 (UTC)[reply]

On editing, it shows MediaWiki:Revreview-editnotice when the current version is sighted, and MediaWiki:Revreview-edited when there have been edits after sighting. Saving shows MediaWiki:Revreview-newest-basic-i and a note about an authorised users having to review edits to the page. --93.97.227.226 (talk) 01:25, 5 January 2009 (UTC)[reply]

And how will newbies react when their edit is visibly registered and then disappears? The tests will not register those who go away confused or disgusted. Septentrionalis PMAnderson 17:00, 4 January 2009 (UTC)[reply]

"Hey where's my edit?? Let's do it again!" Clicks on edit like before. "Oh, my edits are visible at the top of the page in green, that must be good. And right above them it says that 1 change needs review, and that Edits to this page will be incorporated into the stable version once an authorised user reviews them." ... "Oh and just to be sure, they're in the edit box too where I wrote them, so there's no way anything could have gone wrong! How long do I have to wait for an authorised user?" Does no answer to the last question make people confused or disgusted?? Haven't accounts of first time Wikipedia experiences shown more surprise that anyone can edit, rather than their attempts at humorous inserts sticking for a longer time? Anyone with any experience in collaborative environments, and heck, experience in working with people, will expect peer review. That's what all you watchlisters are doing already anyway, it's just hidden away to us anons who then have no idea of what's going on behind the scenes and nothing to lure us to participate in it. --94.192.121.203 (talk) 01:25, 5 January 2009 (UTC)[reply]

"all IP edits equally suspicious"? I, and Huggle's edit-assessing algorithm, beg to differ -- Gurch (talk) 17:11, 6 January 2009 (UTC)[reply]

I will agree with Gurch that "all IP edits are not equally suspicious" but speaking for myself they do all get more attention from me as being potentially more suspicious. - Other factors that I take into account include the edit summary (no edit summary = more suspicion) I consider registered users with redlinked user pages and talk pages to be no different that IP users. Dbiel ^(Talk) 20:38, 9 January 2009 (UTC)[reply]

Thankyou Happy-melon, Ec5618 and 94.192..., I do sort of see where I was going wrong while thinking about it. PMAnderson still raises my main concern- this is complicated and bureacratic, and is probably going to scare off potential users. Consider my opposition to this scheme weaker. J Milburn (talk) 23:41, 4 January 2009 (UTC)[reply]

Here's my attempt at answering the above questions. Note that some of the details are configurable options in the software. Also note that the use of flagged versions can be enabled or disabled on a page-by-page level.

If a non-logged-in user makes an edit to a page, they will immediately see the new version. If they return to the page later, then they will see the "flagged" version of the page, and they will also see a message explaining that there is a newer "draft" version available. They will be able to click on a link to see the new "draft" version, and diffs between the latest draft and the earlier "flagged" version. They will have to wait for someone to review the changes before the "draft" version gets promoted to a "flagged" version.
If a non-logged-in user clicks "edit this page", they will be editing the latest version of the page, whether or not that version is a "flagged" version. If the version they are editing is not a "flagged" version, they will also see a diff between the "flagged" version and the latest "draft" version that they are busy editing.
1. They do not edit the approved or "flagged" version version, they always edit the latest version, whether or not it is "flagged".
  1. Reviewers see only one line of history, just as at present. Reviewers must check the diffs between the older "flagged" version and the latest "draft" version before they mark the latest draft as "approved".
  2. If the same user edits a page several times, and a reviewer wants to choose to save some but not all of theyr edits, then the reviewer will have to use the "undo" function to undo bad edits, or manually edit the article to their liking; after that, the reviewer can mark their own latest version as "flagged".
2. There is no way for multiple saved up changes to conflict with each other, because there is only a single line of history, and each change (whether "flagged" or not) builds on the previous change (whether "flagged" or not).
  1. If reviewers make any changes before flagging an article, those changes will show up under the reviewer's name in the history.
Recent change patrollers will be able to continue the exciting job of zapping vandalism, hunting down trolls and getting them blocked; and will also be able to do the less exciting work of clicking to "approve" edits.
This would not mean that we would be treating IP editors as vandals by default.

—AlanBarrett (talk) 19:12, 8 January 2009 (UTC)[reply]