Wikipedia:Search Engine NOCACHE by default proposal

This is a failed proposal.

Consensus for its implementation was not established within a reasonable period of time. If you want to revive discussion, please use the talk page or initiate a thread at the village pump.

Shortcut

WP:NOCACHE

This page in a nutshell: Wikipedia will be set "NOCACHE" for search engines to prevent bad results from populating to any number of outside websites.

As of February 2009, Wikipedia allows all search engines to cache its results. That is, if a search engine like Google happens to crawl a page, any inappropriate or bad content, including WP:BLP violations, may be propagated out onto the Internet for an indeterminate amount of time. However, we have the ability to set Wikipedia to be NOCACHE in our robots.txt file. The major benefit of this is that search engines would only report the current state of an article (or any page) at any given time.

At least once, a slightly prominent BLP article was vandalized with racial epithets that the world's search engines then cached.^[1] A vandal replaced the entire BLP article with three epithets.^[2] However, the damage was done, and according to Wikipedia on search engines, we were now referring to the BLP subject as "NIGGA".^[3] The edit was reversed less than two minutes later, but the damage was done.^[4]

That was one of the single most-watched BLP articles we've ever had--what chance do the hundreds of thousands of lesser-known BLP articles have? The idea behind this proposal would be to protect not just BLPs, but the integrity of our articles themselves from being cached with bad information, even temporarily.

References

^ Oswald, Ed (2009-02-17). "Google Search for Barack Obama Reveals Racial Epithets". Technologizer. Retrieved 2009-02-18.
^ 04:44, February 17, 2009 edit to Barack Obama.
^ The edit in question, which was cached by Google. It was done at 04:44, February 17, 2009. It was reversed at 04:46, February 17, 2009, <2 minutes later, but the damage was done and saved for the world to see, on one of the most well-known living people on Earth, for an unknown length of time.
^ 04:46, February 17, 2009 edit to Barack Obama.

Discussion is good, and here is what I want to discuss

I support this so that any vandalism on a BLP page won't be stored by Google & other search engines and then distributed throughout the internet by nefarious trolls. Sincerely...--Punkrocker27ka (talk) 08:06, 18 February 2009 (UTC)[reply]

I do not think NOCACHE should be used at all. Search engines are possibly the most common ways that people locate Wikipedia content, and this would be made more difficult if search engines were prevented them from caching Wikipedia content. Wikipedia cannot be blamed for vandalism that occurs to articles, including those about living persons, and if the vandalism is picked up by search engines. The risk of vandalism is inherent in the nature of Wikipedia, and I believe most people who access the website realize this. — Cheers, JackLee ^–talk– 08:36, 18 February 2009 (UTC)[reply]

I'm not sure about this. Ideally we need to implement a versioning system, which would prevent the typical anon vandalism from being in the stable version, and only the stable version would ever go to a search engine. That would make this proposal irrelevant. But, if the choice is this (NOCACHE) or nothing, I'ld support this. --Rob (talk) 08:16, 18 February 2009 (UTC)[reply]

In a perfect world, this plus flagged revs on BLPs would nuke just about anything new from getting in. Just a note too, the user that made the edit? Registered and editing since 2006. rootology (C)(T) 08:18, 18 February 2009 (UTC)[reply]

This rather odd user had not editted since 18 September 2007 until sparking off this incident, and has made less than 50 edits in total. MickMacNee (talk) 10:02, 18 February 2009 (UTC)[reply]

In that regard, doesn't that essentially negate Rootology's suggestion of flagged revisions? As his revision would not have been flagged, it would have gone on anyway -- just another example of why flagged revisions falls short of even the lowest of expectations, in my biased opinion. 128.61.56.41 (talk) 10:39, 18 February 2009 (UTC)[reply]

Not at all. FRs are very flexible, and while it's possible (I believe) to turn on FRs for only IP users, any implementation of it would almost surely be for all users to be flagged. Plus someone with that few edits wouldn't be a sighter, so FRs in this case would have indeed prevented the incident. ♫ Melodia Chaconne ♫ (talk) 13:01, 18 February 2009 (UTC)[reply]

Just a question: We use Google cache to review deletions, especially for those that aren't admins. I know there are other resources out there (Deletionpedia, Wikibin, ...) but my experience of them is that they are rather sketchy, especially with regard to recent deletions. How are we going to replace it? MER-C 08:47, 18 February 2009 (UTC)[reply]
- The Google cache link was added to Template:Newdelrev in March 2007[1]. Prior to that, non-admins could only review based on their memory of the article, if they had seen it, the discussion at the AFD if any, and the other statements in the nomination, and based on any mirrors that might exist. Since then, for recent deletions, the cache is useful (but expires over time, so only for a few days). DRV functioned before we began using it, and could do so again without it. Do the right thing without regard to DRV - the handful of DRV regulars can become admins if suitable, and when reasonable to use (a minority of DRV cases) Template:TempUndelete still exists. We might want an admin to make using it when appropriate their DRV task, but it will require judgment; a bot would not be a good idea. GRBerry 15:54, 18 February 2009 (UTC)[reply]

What will the effects of this be on Wikipedia's appearance in the search results? If Google isn't allowed to cache Wikipedia articles, it won't be able to present the two-line excerpts in the results, making search results less useful and less attractive. --Carnildo (talk) 08:38, 18 February 2009 (UTC)[reply]

- Maybe I'm wrong, but as I understand it, as long as a page is indexed, there will be a snippet. The "cache" feature just lets the user see a full copy of the whole page. But, as long as the page is indexed, the user can see a snippet of text, usually containing the search terms. If you do a Google search, you'll often see some results with no cache, but still having a snippet. --Rob (talk) 11:48, 18 February 2009 (UTC)[reply]

I believe you are correct, and it would mean in this particular case that Google searchers would still have seen the racial epithet on one of arguably the most important BLPs in Wikipedia. This proposal may be misleading people by implying that it would have made any difference in this high profile incident. Delicious carbuncle (talk) 15:11, 18 February 2009 (UTC)[reply]

Per User:Carnildo's question above, is it really practical to ask google not to cache Wikipedia? Wikipedia is a significant part of all of google's traffic, and google is one of Wikipedia's primary search methods / referring sites. Further, google seems to be mirroring some google stuff and there are lots of mirror sites. And beyond that, aren't there other caches at work? Wikipedia has a cache, everyone's browser has a cache, perhaps there is ISP level caching, Akamai type stuff, etc. If the problem is random vandalism maybe we just have to live with it. If the problem is sophisticated vandals gaming the caches, maybe the answer is to get sites like google to improve their caching and cache flushing system for rapidly changing content... making the whole site's BLP articles uncacheable to deal with occasional vandals may be throwing the baby out with the bath. I'm not arguing either way, just wondering if it's a technical problem with a more precise technical solution. Wikidemon (talk) 09:04, 18 February 2009 (UTC)[reply]

Per Wikidemon, the result of this proposal may well be to diminish Wikipedia to save 2 minutes of misfortune. I wonder, though, if it might be possible to only display revisions which have been around for 3 days without a revert. That is, instead of having such results instantly change, have only the page with non-controversial edits. Just a thought. 128.61.56.41 (talk) 10:29, 18 February 2009 (UTC) I suppose I just re-iterated Rob's suggestion above. Whoops. 128.61.56.41 (talk) 10:39, 18 February 2009 (UTC)[reply]

No thanks. If I choose to open the cached version instead of the live page, I'm doing it for a reason. Google cache is also highly useful when the Wikipedia leadership has tried to slip something under a rug by deletion or oversight. --Apoc2400 (talk) 10:44, 18 February 2009 (UTC)[reply]
robots.txt? Nail, meet sledgehammer. I agree with Apoc2400 immediately above, but do acknowledge that there's a problem. While I'm no expert on either http headers or Google, I wonder whether this kind of thing can't be fine-tuned by programming "pragma" or similar. Thus IFF a revision meets various criteria (which WP:BEANS might suggest should not be publicly discussed), the page is marked as cachable by Google and the rest. -- Hoary (talk) 11:22, 18 February 2009 (UTC)[reply]

"That was one of the single most-watched BLP articles we've ever had--what chance do the hundreds of thousands of lesser-known BLP articles have? " Is this not irrelevant to this proposal? The proposal highlights the fact that vandalism, no matter how short lived, can be cached for an indeterminate amount of time after it has been reverted. The risk stemming from the 'unwatchedness' of articles meaning bad edits stay in articles themselves would seem to be irrelevant to risks posed by caching bad out of date versions, and is not an issue fixable by this proposal. "The idea behind this proposal would be to protect not just BLPs, but the integrity of our articles themselves from being cached with bad information, even temporarily" Similarly, this sentence also seems to miss the point of what this proposal can prevent or protect. The proposal only addresses the harm caused by temporarily caching bad out of date information. If bad information is staying around for longer due to it not being reverted in articles, that is not something this proposal protects us from at all. MickMacNee (talk) 11:31, 18 February 2009 (UTC)[reply]

Seems like another good idea. This is not incompatible with NOINDEX, or with flagged revisions. I could see not having NOCACHE on flagged revisions though, once flagged revisions was up and running well. ++Lar: t/c 12:20, 18 February 2009 (UTC)[reply]

This is, frankly, absurd. We're a free-as-in-speech encyclopedia that is opposed to what amounts to someone mirroring us? So they got a bad version. The benefits from Google caching us outweigh the negatives by several orders of magnitude. Also, I don't mean to sound pedantic and i don't mean this too seriously, but are we sure this doesn't amount to a unwarranted restriction on copying per our obligations to the GFDL? This seems like we're trying to find a problem to solve with a big old cruise missile because we happen to have a big old cruise missile around and no one likes to look at shiny new toys without trying them out. -M^ask? 13:07, 18 February 2009 (UTC)[reply]

Can we have an expert opinion on what this would do to search engine performance? Would it mean that Wikipedia simply isn't indexed in Google, or would it mean that every Google search has to wait until the slow Wikipedia servers respond with a fresh version of the page? Either of these seems unacceptable as a price to pay for this proposal. —David Eppstein (talk) 15:15, 18 February 2009 (UTC)[reply]
Wikipedia is free content. One of the significant freedoms is the right to copy. Attempting to interfere with this is not acceptable. It's also slightly questionable under the GFDL "You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute".Geni 15:32, 18 February 2009 (UTC)[reply]

While I recognize the problem, I'm not sure that this proposal will do anything useful. As long as the page gets indexed it will have the snippet, and the snippet is the problem. We're not going to NOINDEX all of Wikipedia, since that would wipe us off the face of Google, which is the source of a substantial chunk of our hits (and therefore of our new users). NOCACHE seems to merely mean that users can't look at the original Wikipedia page straight from Google. The only real way to combat this is widespread use of FlaggedRevs or some other "protection" scheme, which is potentially worse than the original problem (but that's a separate issue). {{Nihiltres|talk|log}} 15:54, 18 February 2009 (UTC)[reply]

Not a good idea. This won't help matters at all. Almost no one is going to use a cached version to see an old article when they can just click through. There are clear benefits to caching and the claimed issues with not caching are minimal. The only thing this will help out with is clear vandalism such as profanity. No one with half a brain will take such a cached file seriously anyways. From a Do-No-Harm perspective the harm from this is next to non-existent. Anyone using the cached version will have a reason to look at a cache and thus will almost certainly be smart enough that there won't be any issue. JoshuaZ (talk) 17:58, 18 February 2009 (UTC)[reply]

I am not a site administrator for Wikipedia, so do not know a whole lot of internals of traffic, load, etc. However, my very strong hunch is that adding NOCACHE would result in a huge increase in the load the (already over-burdened) Wikipedia servers would take. Making such a change for what seems like a pretty small issue seems like a bad idea to me. Of course Google might cache the Wrong Version of a page, but Google re-crawls Wikipedia pretty often (and they seemed responsive to notification of this particular unlucky timing). In any case, there are many Wrong Versions that stay on Wikipedia for a long time, so avoiding caching would have little effect. For the most prominent pages, obvious vandalism is quickly corrected, but for either less popular articles or less obvious vandalism, avoiding caching does not necessarily avoid showing readers inappropriate content (including, in some cases, overt BLP violations). LotLE×talk 18:19, 18 February 2009 (UTC)[reply]
- Also, to expand on Lulu's comment, the more popular a page the more links it has to it so the more frequently google will recrawl that page. JoshuaZ (talk) 18:24, 18 February 2009 (UTC)[reply]

More BLPParanoia. Make an argument on the merits and not on the obvious premise that vandalism exists. Google's cache serves a function, offers a disclaimer and costs us nothing. Just like the "NOINDEX" proposals (not sure where they are right now) which seek to hide projectspace from the rest of the world, this is another way for us to dab some coverup on what we feel are warts. Articles get vandalized. It's a drawback of having a user-edited encyclopedia. We run some risk of some graybeard columnist looking at a wikipedia article and then tut-tutting about it in some column, but eventually the world will pass those people by. We should not act out of fear of those people.
Further, what right do we have to say "don't cache this page"? Google is well behaved, so they will comply with the request, but is that in keeping with our tenets and our license? When I commit a change to an article, I'm releasing that to the world. For whatever purpose. Right now. Maybe in blocking this we stop someone from using google's cache and vandalism to learn something more about networked systems. Those articles didn't use google cache (instead stemming from prior familiarity w/ wikipedia), but we don't need to foreclose future research opportunities. What if someone at google (noting our high pagerank and throughput there) wants to write an intelligent caching software to cache selectively based on the recent changes feed? They would never get that chance if we do this. What if someone wants to track how vandalism propagates across cached copies and finds its way into mirrors (which won't be impact by this, BTW)? Now that opportunity is gone. And for what? So that we can't look in google cache and see that someone is a racist asshole with an internet connection? No. Protonk (talk) 18:42, 18 February 2009 (UTC)[reply]

I started out thinking "yes, let's implement NOCACHE, but then my rational mind took over. Put simply, anyone who relies ion a search engine's cached version of a file really ought to be taken outside and shot for stupidity. On the basis that I enjoy seeing stupid people make fools of themselves I think caching is fine and we should not implement the nocache element. Fiddle Faddle (talk) 20:26, 18 February 2009 (UTC)[reply]

- Looking at a cache is a very reasonable thing to do. If I search for a keyword, and find a page that no longer has that word, I may look in the cache to see how that keyword (which is highlighted by Google) was used when the page was indexed. The cache feature of Google is one of the few areas of transparencies by Google, where it shows what it bases its results on. --Rob (talk) 20:43, 18 February 2009 (UTC)[reply]

- - I think he means (maybe, I can't really tell) that anyone who looks at a cache of a page available in an uncached form and concludes that the cached version represents something immutable and validated should be disregarded. Put another way, there are multiple warnings that a cached copy of a page (where there is a current copy) is just a snapshot in time. Protonk (talk) 21:43, 18 February 2009 (UTC)[reply]

- - - Seemed obvious to me. Maybe he looked at a prior cached page. Fiddle Faddle (talk) 22:14, 18 February 2009 (UTC)[reply]

This will not work. It interferes with the GFDL. It will overload the WP server. It will not prevent vandalism. Fear of vandalism is prompting ever more aggressive security measures here. Get too restrictive and over-secure and we kill the Goose that lays the golden eggs - it's already being strangled. Riversider (talk) 23:14, 18 February 2009 (UTC)[reply]

Something like the HTML code <meta name="robots" content="noarchive"> would be added to each page served. This isn't very different than other directions Wikipedia gives for certains types of pages (for instance "NOINDEX" for talk pages). So, I don't see how an overload could occur. --Rob (talk) 02:31, 19 February 2009 (UTC)[reply]

Cost outways benefits by a bunch. Protonk says it best. - Peregrine Fisher (talk) (contribs) 23:16, 18 February 2009 (UTC)[reply]

Two points. (1) Contrary to Thivierr above, the additional load would be dramatic. I understand that sending the NOCACHE flag is minimal bandwidth (it's best done in robots.txt, not in a <meta>, but that's unimportant). The first time Google, or another engine, crawled a page the load would be the same, but the thousand regular users who now read the cached result would all be extra load to Wikipedia. (2) The arguments about NOCACHE somehow violating GFDL is completely bogus and irrelevant. I don't think we should add it, but if we did it would be purely a technical advisory to web crawlers, not a copyright prohibition. If, under some wildly unlikely hypothetical, the Wikimedia foundation were to sue a search engine for caching... well, maybe the search engine would have a GFDL defense. But nothing like that is remotely plausible. 03:39, 19 February 2009 (UTC)

- I think there's some confusion about different uses of the term "cache". As I read the intent of the proposal, we're talking about what users see on Google search listings. Currently, next to any listing, the users sees a "cache" link. They can click on this, and see a copy of the page, as it existed when it was last indexed by Google. If we use the code I mentioned above, this option would no longer exist. We additionally have the option of controlling whether users in Google can see a "snippet" for each listing. This has absolutely positively no impact on Wikipedia servers (assuming an equal number of visits). Googlebot will spider our pages exactly as they have always, regardless. If a user chooses to visit a page at Wikipedia, the load on Wikipedia's servers will be the same. Remember, we're *not* talking about how Wikipedia servers do their own internal caching. We're talking about what people will see at Google, not at Wikipedia. --Rob (talk) 04:03, 19 February 2009 (UTC)[reply]

- Sorry, I didn't mean to claim that we legally cannot disallow caching (As we certainly can disallow all crawlers). I meant to say that the nature of the project means that nocaching runs counter to our notion of releasing information. Protonk (talk) 04:39, 19 February 2009 (UTC)[reply]

Aside from the objections already made, Google's cache feature allows users to get around censorship. I am in Vietnam, which has briefly blocked Wikipedia on several occasions. China blocks Wikipedia all the time. Kauffner (talk) 09:46, 19 February 2009 (UTC)[reply]

Without commenting on the merits of the proposal, the example given is, frankly, dumb. I personally don't care if the Barack Obama's cache says 'NIGGA'. The far bigger problem is much less well known people who are libelled or whose personal information is released. We can and usually will delete or oversight the edits if they are bad. However the cache can and often will be used by people to see the deleted/oversighted edits. This is a far bigger problem then clear cut vandalism of a very well known invidual. User:Doc glasgow/The BLP problem may be helpful here Nil Einne (talk) 10:17, 19 February 2009 (UTC)[reply]

Even for BLPs, our usual way of dealing with vandalism is to simply revert and warn/block the vandal. The nasty revisions remain in the history unless there is private information there which should be removed, or if there is a special request. (I have deleted some particularily nasty revisions as well, and brought a few to the attention oversight, but rarely). If we host revisions like that permanently in the history then I don't think we should be overly concerned about Google hosting cached revisions for a few days. For the few instances it is a problem, an ad hoc solution is in order, but as a general solution, i don't think the benefits outweigh the negatives if Wikipedia's visibility will be severely reduced on the largest search engine. Sjakkalle (Check!) 14:50, 19 February 2009 (UTC)[reply]

No, absurd and disproportionate knee-jerk reaction to a problem of questionable significance. — Werdna • talk 06:32, 20 February 2009 (UTC)[reply]

And sometimes the discussion is based on ignorance

It's wonderful to behold the mixed levels of knowledge and assumptions above. Some people say it will increase the load on Wikipedia servers, others gainsay that, and some have even read the link describing what NOCACHE really does.

How will the folk looking at the discussion on this proposal reassure all of us, the knowledgeable, the quasi-knowledgeable, the dangerous loon who knows nothing but still uses that lack of knowledge to have an opinion, and the truly undecided person questing for knowledge that our various opinions have been listened to, counted or discounted based upon technical accuracy, and weighted correctly according to firmly held opinion?

That question mark should have come earlier. The sentence is far too long, but I don't care! The point is that most of us haven't a clue what we are talking about, but we are sure that we are right. I number myself in that, so don't feel I'm accusing you. Most of us opine out of ignorance, or what a "bloke in the pub" said and that we believed after the third pint.

Please will a truly technically competent person use very short words and very short sentences to tell us what effects this proposal would have, if adopted. Please don't refer us elsewhere. Many of us have read and some of us understand it.

Looking at the reason for proposing this in the first place that reason looks to me like: "If we cache, then, well, but, but, but, but, but, but.... it, er, well, but..." So I am rather lost. It's a rather small thing, isn't it? I mean really? Fiddle Faddle (talk) 15:58, 19 February 2009 (UTC)[reply]

I'm not even convinced the right explanation is now linked above. (I believe it was added after the original proposal.) The link is to HTTP headers, but the proposal is about robots.txt, which appears meaningfully different. Which makes much of the discussion about that point look meaningfully off to me. But what do I know, because I am certainly not an exper on either. GRBerry 16:46, 19 February 2009 (UTC)[reply]

But this is Wikipedia. We have the Wisdom of Crowds. This means that it must be correct. Or not. Fiddle Faddle (talk) 16:49, 19 February 2009 (UTC)[reply]

My proposal was actually significantly different than what this has turned into, which bizarrely veered into GFDL advocacy and who-knows-what now, as the original formatting has been changed by (sorry guys) anti-polling types. It was literally just a query if people supported the idea of minimizing exposure of cached copies of the site (which has nothing to do with GFDL, because by that argument we violate GFDL by even deleting anything). I think discussion manipulation actually killed this, bizarrely, and that makes me sad. I'm starting to get mightily sick of anti-survey and anti-poll FUD. This is what I originally posted. I was looking for feedback in an organized, orderly fashion. rootology (C)(T) 16:54, 19 February 2009 (UTC)[reply]

It seems like most of the objectors are objecting on grounds completely separate from the GFDL concerns. I don't see any real basis for the GFDL concerns but there are a lot of other issues with this sort of proposal that have been brought up above. JoshuaZ (talk) 16:58, 19 February 2009 (UTC)[reply]

You have to remember that we actually have The Ignorance of Crowds rather than the Wisdom. The opposite of Artificial Intelligence is Genuine Stupidity.

I have just been pointed at the "right" link which describes what Google does with requests not to cache. From reading the page in detail I return to "If we cache, then, well, but, but, but, but, but, but.... it, er, well, but..." as stated above. So, that will be amusing, then, if this proposal goes ahead. I can't help but think "Good luck with that, then!"

I'm disappointed that manipulation killed it. The discussion is interesting, if bizarre. Fiddle Faddle (talk) 17:01, 19 February 2009 (UTC)[reply]

This is what I originally posted before Scott Mac changed the formatting. This was ********NOT******** a proposal to turn this on, but to gauge a clear view of what people thought, for a possible future proposal. rootology (C)(T) 17:08, 19 February 2009 (UTC)[reply]

I think your original good intention was based upon a storm in a teacup. So some fool called the US President by a bad name. In the UK we have a one eyed Scottish idiot, according to Jeremy Clarkson who had to apologise for the two factual statements and refused to apologise for the opinion. This bad name for Obama rubbish got picked up by some blogger or other and he attracted folk to his site to get advertising revenue from it

Wikipedia can be vandalised. Whoopeeee! Now let's all get over it. This was manufactured news. Ask Obama if he actually cares! Fiddle Faddle (talk) 17:24, 19 February 2009 (UTC)[reply]

No, my original intention was based on the fact that if Obama's article, with a million eyes on it, can be dinged this way with potential defamation and/or libel archived across multiple websites from our ";broadcast", what chance do the tens of thousands of unwatched BLPs have, as I explicitly said in the lead of this page, in my conclusion? rootology (C)(T) 17:28, 19 February 2009 (UTC)[reply]

But Wikipedia can be vandalised. It truly does not matter. No-one will die. The earth will not stop spinning. The magnetic poles will not reverse. It's not as if it actually matters. And the actions of search engines are absolutely not Wikipedia's responsibility. People here create or vandalise. Search engines visit and record. Sometimes it hits at a bad time. So, even though I recognise that you feel strongly about this, even hijacked to an extent, I suggest that it is truly not important. Interesting? yes. But not important. Fiddle Faddle (talk) 17:36, 19 February 2009 (UTC)[reply]

As mentioned above, this is actually the relevant mechanism for doing what's proposed. Notice there's no keyword "NOCACHE" (it's "noarchive"), and it's *not* part of the "robots.txt" file (despite what somebody says above). It also doesn't directly effect snippets, which is what the original "NIGGA" example used (in the picture), which can be done by doing this. And contrary to some opinions expressed, it has nothing really to do with Wikipedia's web server caching, and no effect on performance. While reformatting of this page messed things up, there was never a well formed proposal to begin with. Finally, User:Scott MacDonald didn't reformat the comments, he removed them entirely. Somebody else returned them, without the polling format (legitimately wanting to keep the comments, without edit warring of the whether there's a poll or not). --Rob (talk) 17:43, 19 February 2009 (UTC)[reply]

All true and all not terribly relevant. Most of the objections are specifically about issues that have little or nothing to do with the technical details. JoshuaZ (talk) 20:57, 19 February 2009 (UTC)[reply]

And thus we come full circle to the heading of this section. Fiddle Faddle (talk) 21:04, 19 February 2009 (UTC)[reply]

What we really need is a way to ask Google to "re-snippet" the page so that the snippet is current. Then when you see BLP vandalism, you revert it and ask for a re-snippet. As long as we promised to keep it under a hundred or so requests a minute, Google would probably be happy to work something out with our devs. Franamax (talk) 22:41, 19 February 2009 (UTC)[reply]

Every webmaster on Earth would love it, if they could get Google to refresh their results faster, just by asking. Google already refreshes highly ranked pages fast, which Wikipedia pages, tend to be, so we're not going to make it better by asking. The one semi-related thing a webmaster can ask for, is to have the removal of deleted pages expedited, which would be good, since deleted pages often were deleted for BLP violations, but sit around in Google, long after deletion (or they used to, I haven't checked lately). We can also ensure certain archives, like archive.org, doesn't archive such pages. --Rob (talk) 22:50, 19 February 2009 (UTC)[reply]

I'm thinking in particular of a self-published poet who unwisely decided to register here in his own name and create an article about himself, then started moaning about how his ANI's and AfD's were showing up in gsearches. Yes, it was his own fault, but we courtesy-blanked around and it still took a few weeks for it all to flush out of Google. It would have been nice in that case to queue up some re-snippeting.

And yeah, every webmaster would love it - but I'm pretty sure Google has "noticed" where we show up on searches. :) I'd bet they'd cooperate in setting up a direct channel. Wikia, no way, but we're such nice people here. :)

That's an interesting link you provide and actually could be implemented in software, triggered by "Delete this page". I haven't seen too many devs hanging out here though... Franamax (talk) 01:26, 20 February 2009 (UTC)[reply]

Note that google no longer indexes ANI and AFDs anyways so the primary issue here is done. Making pages lose their index when they are deleted isn't terribly useful and would damage transparency. The idea of a person being smart enough to go use a cached version but not smart enough to realize that it was likely deleted for a reason just isn't very credible. JoshuaZ (talk) 16:20, 20 February 2009 (UTC)[reply]

Not a lot of smarts are involved. Somebody sees a listing, finds the page is gone, but then sees the "cache" link, which they click. Deleted pages will ultimately, be removed from Google's index, so doing so faster, would be at worst, be harmless. I don't understand why "ANI and AFDs" were the primary issue. If an AFD is started over a problem BLP, then the actual article is more harmful than the related AFD or ANI discussion. Keep in mind we're not worried about the reader, who has some blame, if they blindly trust something that's clearly expired. But, the BLP-subject isn't to blame about libel surviving, and it's often the only thing found in a Google search on their name. Also, if Wikipedia wanted this type of "transparency", we'd make deleted contents visible (which is very easy to do). --Rob (talk) 18:45, 20 February 2009 (UTC)[reply]

In regard to AfD and ANI I was talking about the problem mentioned above of people who are unhappy with the discussions about them on AfDs and ANI. That's a far more common issue. We need cached pages for DRV as well as allowing users to access Wikipedia where it is blocked. This really is throwing the baby out with the bathwater. It doesn't help much at all. We need to deal more with vandalism, not try to remove secondary consequences of it. JoshuaZ (talk) 18:53, 20 February 2009 (UTC)[reply]

Sometimes, there's good reason to make deleted content available on both Wikipedia and Google. Sometimes, there's good reason to hide deleted content on both Wikipedia and Google. There's never reason to hide content on Wikipedia, but still show it on Google. In a DRV, an admin can temporarily restore safe content to an appropriate place for viewing. I agree cached pages are good for places where Wikipedia is blocked, and we should continue to allow non-deleted pages to be cached as normal. --Rob (talk) 19:05, 20 February 2009 (UTC)[reply]

Sure there is. The vast majority of content deleted doesn't belong on Wikipedia but isn't at all libelous. There's no good reason to force cache removal of such material. JoshuaZ (talk) 19:06, 20 February 2009 (UTC)[reply]

In summary

Obama's article was vandalized, and Google indexed the vandalism. A proposal was made to prevent this in the future, but was based on a misunderstanding of how Google works, and would not have actually prevented it. The correct mechanism for preventing it was discovered, but it would have a major negative impact on using Google to search Wikipedia. Any questions? --Carnildo (talk) 22:30, 19 February 2009 (UTC)[reply]

Yawn. "Wikipedia gets vandalised. Get over it." That is a briefer summary. Fiddle Faddle (talk) 22:55, 19 February 2009 (UTC)[reply]

This is the wrong horse

The motives are first class, but the saddle is on the carthorse, not the racehorse.

All good problem solving goes back to analyse the problem, and the problem is not, repeat not, search engine caches, nor archive system archives. No, the problem is... Vandalism.

And yet that is not the problem. The true problem is the delay in detecting which bits are vandalism on pages whose watched status is such that few folk notice vandalism.

The proposal should be:

"In order to minimise the risk of disrepute due to undetected or slowly corrected vandalism, Wikipedia must run ever more numerous and ever more efficient vandal detection and rectification services."

At a stroke this reduces the probability of an embarrassing edit being hailed as "Wikipedia is the product of a load of amateur authors having fun." Ah wait, That one is true. Ah yes "Wikipedia is inaccurate trash." Hmm. Some of that is true. And surely, speaking unemotionally, Obama is hailed as being black despite being half white, so maybe the rude word he was called is fact too?

Our fight is not with cache. Cache is a digression. Our fight is against vandalism. Fiddle Faddle (talk) 23:13, 19 February 2009 (UTC)[reply]

Spartans! Prepare for glory! Bigbluefish (talk) 23:18, 19 February 2009 (UTC)[reply]

Ave Imperator. Morituri te salutant! Fiddle Faddle (talk) 23:22, 19 February 2009 (UTC)[reply]

Unfortunately, you didn't mention the one solution that actually would address this problem: flagged revisions. Flagged revisions would ensure that the page displayed to anonymous users — that is, the page that Google caches — has always been looked over by a trusted user, at least on BLPs. Had this system been in place, all of this could have been avoided. --Cyde Weys 02:51, 24 February 2009 (UTC)[reply]

[Oswald_1-1] Oswald, Ed (2009-02-17). "Google Search for Barack Obama Reveals Racial Epithets". Technologizer. Retrieved 2009-02-18.

[Obama_edit_1-2] 04:44, February 17, 2009 edit to Barack Obama.

[Obama_vandal_screenshoot_Google1-3] The edit in question, which was cached by Google. It was done at 04:44, February 17, 2009. It was reversed at 04:46, February 17, 2009, <2 minutes later, but the damage was done and saved for the world to see, on one of the most well-known living people on Earth, for an unknown length of time.

[Obama_edit_2-4] 04:46, February 17, 2009 edit to Barack Obama.

[1]

[2]

[3]

[4]

See also

References

Discussion is good, and here is what I want to discuss

And sometimes the discussion is based on ignorance

In summary

This is the wrong horse