Jump to content

Wikipedia talk:Plagiarism/Archive 5

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 3Archive 4Archive 5Archive 6Archive 7Archive 10

"Text available under a free license" section

The Text available under a free license section seems to be redundant, unclear and possibly even incorrect/out-of-date. Are editors already aware of this, or discussing this somewhere else on this page? If not, I can list my concerns here. Abecedare (talk) 00:26, 17 June 2009 (UTC)

Migration, do you mean? I've tagged it. I'm working through the various copyright policies & templates at the moment, but this one also needs to be updated. Oi. --Moonriddengirl (talk) 00:28, 17 June 2009 (UTC)
Yes, that's the "out-of-date" part. The redundancy is with the "Public domain or free license text" section that already covers free text. And the part I found unclear is:
"In all cases, the moral rights of the original authors whose works are copied must be respected during the term of their rights, which means that it is imperative that their work is distinguishable from the prose of the Wikipedia article. Because articles normally evolve through incremental changes, it is important to retain an anchor to the originally copied text so that subsequent changes can be traced."
In short:
  1. Do moral rights have any term limitation i.e., do they ever expire ?
  2. What does "distinguishable from the prose of the Wikipedia" mean ?
  3. What is "an anchor to the originally copied text" ?
I realize that this proposed policy is still under active development, so I don't want to rush anyone. Just want to make sure that the primary editors are aware of such issues. Abecedare (talk) 00:45, 17 June 2009 (UTC)
  • Yes, under the Berne Convention, moral rights are restricted to the same term as copyright provisions normally would be. Previous discussion here
  • The "distinguishable from" wording springs from that same thread I believe, or 'round about that time.
  • "Retain an anchor" means to make clear what exact free text was copied, i.e. the concept of copying in the exact text and only then subsequently modifying it. The "anchor" would be the &oldid of the article where it was first copied in. Franamax (talk) 01:05, 17 June 2009 (UTC)
Thanks Franamax. I didn't know about the Berne convention's discussion of "moral rights".
About the "anchor": The previous section already says, "A practice preferred by some wikipedia editors, when copying in public domain or free content verbatim, is to paste in the content in one edit, with indication in the edit summary of the source of the material. ...", which is much easier to interpret.
I am still not sure what the "distinguishable" phrase means, but I'll wait for a few days before re-raising the issue since I suspect the subsection is due to be rewritten or merged with Public domain or free license text anyway. Cheers. Abecedare (talk) 01:17, 17 June 2009 (UTC)

←I have updated. I'll leave any redundancy issues for further discussion. :) --Moonriddengirl (talk) 12:31, 17 June 2009 (UTC)

I have just removed " which means that it is imperative that their work is distinguishable from the prose of the Wikipedia article". I have no idea what this means, but it appears to imply that we could not take CC-BY_SA text from another source and incorporate it into a WP article. That implication is incorrect. — Carl (CBM · talk) 16:29, 17 June 2009 (UTC)

Kaldari has now removed the entire "Because articles normally evolve..."sentence with the edit summary You do not "need" to retain an anchor to the original text (attribution will suffice), plus it is not always possible or helpful to link to the original text.[1] Perhaps the entire "Text available under a free license" section needs to be rolled up into the "Public domain or free license text" section above (this is perhaps the result of new editors coming along and just writing in their own sections to express their own ideas). I have some concerns with the removal though:

  • Explanations in an edit summary are presumably only meant for the page editors, not the page readers.
  • Does this removal mean that subsequent changes to free text do not need to be traced? In other words, we can just say "there is some free text in here somewhere"?
  • From the edit summary, what does "attribution will suffice" mean? Does this mean that so long as I place a {{PD-attr}} template, the addition is sufficiently made, irrespective of whether or not I specify what text was copied? Shouldn't we then just place that template on every article we have, to allow blanket copying in of verbatim free text?
  • And if there will be no anchor to the original insertion of free text, how will we ever know that it is gone? This implies that every article that has ever had an attribution template applied must keep that template forever, no matter hopw extensively it has been rewritten. If there is no way to trace back to the original text, there can never ever be any way to know if it has been completely rewritten so as to eliminate all original wording and structure. So we're basically stuck with the attribution templates forever.

I have no problem with any of the above, if that is what the community wishes. Franamax (talk) 23:37, 17 June 2009 (UTC)

I do not believe that changes need to be traced, and I do believe attribution templates should never be removed. Once we incorporate text from elsewhere, our article is perpetually derived from it. One might as well ask if we can remove an editor from the page history if all their contributions have been replaced. I am also thinking of GFDL or CC-BY-SA text here, where we need to acknowledge the copyright indefinitely. — Carl (CBM · talk) 23:46, 17 June 2009 (UTC)

The sentence "Because articles normally evolve through incremental changes, it is important to retain an anchor to the originally copied text so that subsequent changes can be traced" needs to be deleted or rewritten. Attribution is important, but specifying that it must be through an anchor link is absurd. There are plenty of physical books, zines, journals, and other media that are under free licenses that you cannot create "anchor links" to. Believe it or not, not all media is on the web. Since we already discuss attribution ad nauseum, I think the sentence should just be deleted. Kaldari (talk) 16:01, 18 June 2009 (UTC)

Which exact word led you to believe that an "anchor link" meant a link to some external website? And/or what commentary led you to believe that anyone thinks free sources exist only on the series of tubes? As I explained only a few posts above, "anchor" refers to the need to maintain an exact reference to the inclusion of external author text into the article, search backwards from here for the text "&oldid" to find it (hint: 01:05 17Jun09). Actually, reading the entire talk page archive would be a useful exercise.
Carl, I acknowledge what you have been saying from the early weeks of the genesis of this page, the attribution tag should never come off. However. By ridiculous extension of your thoughts, I could apply the {{1911}} or {{citizendium}} tag to any random article right now and it could never come off. Someone could say "does not incorporate free text", I could say "does too, it has the template", they could say "well, where is the free text" and I could say "Carl already said I don't have to say where it is". That's absurd, but this is en:wiki, so it's well within the bounds of possibility. :( At least going forward, I would prefer that there be explicit acknowledgement within at least the edit history of what exact text was copied from an external free source. That accords well with the en:wiki model, where we use the esit history to directly attribute authorship - either it's the contributing editor, or it's an attribution within at least the edit summary, but preferably via some combination of edit summary, attribution template, and talk page notation. Franamax (talk) 09:19, 19 June 2009 (UTC)
I support changing the templates so that they include, in some fashion, the date or oldid when the content was added. It doesn't seem unreasonable at all to give that sort of attribution. We might have to grandfather old articles, though. I think it is impractical to require edit summaries to have a certain form, because once they are made they cannot be fixed. We can document best practices, but the scope of WP is such that we need to develop systems that are flexible enough to allow for mistakes. — Carl (CBM · talk) 13:23, 19 June 2009 (UTC)
Franamax, if you just want an oldid option added to the templates, that's fine. I don't see what that has to do with the guidelines here. No casual reader of these guidelines is going to understand that "anchor link" means "oldid added into an attribution template". As it stands now, the wording is useless and needs to go. Once the oldid option has been added to the attribution templates, come back and add something about that specifically to the guidelines. Kaldari (talk) 15:39, 19 June 2009 (UTC)
Frankly, I had no idea what an "anchor link" was; HTML anchor is the only meaning of "anchor" that came close to fitting, except it doesn't fit. — Carl (CBM · talk) 16:39, 19 June 2009 (UTC)
So now that I've explained the original intent of the wording, is there a way to successfully rephrase it? Or just delete it without regard to the previous megabyte of discussion and wait until someone else comes along and says the same thing with different words? Either way, I'm fine. As long as the message is there, it certainly needs not be my wording. Franamax (talk) 04:35, 20 June 2009 (UTC)

1st section: Student cheating definition, conflicting definitions of plagiarism

Lathrop/Foss definition

Is anyone attached to this intro?

Definitions of plagiarism differ. A very basic, plain-spoken definition is offered by Ann Lathrop and Kathleen Foss in their 2000 guide Student Cheating and Plagiarism in the Internet Era: A Wake-up Call: "If you didn't think of it and write it all on your own, and you didn't cite (or write down) the sources where you found the ideas or words, it's probably plagiarism."[2] It doesn't matter where you find the information; even if your source is free content, you should acknowledge it.

I think it should go. "Thinking of it and writing it all on your own" are not an option for Wikipedians (at least not one that we want to encourage). I think we should not even go there, evoking the university setting in this manner.

We should just tell editors what an appropriate use of a source is, and what isn't. We could just keep the sentiment of the last sentence:

It doesn't matter where you find the information; even if your source is free content, you should acknowledge it.

Competing definitions

The following two paragraphs are, again, too "academic" for my liking:

Some definitions of plagiarism require that it be committed with the intent to deceive, while others do not.[3] Wikipedia is more concerned with impact than intent; whether it is the result of deliberate deception or improper citation, duplicating the work of others without credit can bring both author and publisher into disrepute.
Editors should be aware that the precise definition of what consitutes plagiarism is often disputed both in general and here on Wikipedia. You can always avoid any dispute by one of: rewriting text completely into your own words, using multiple referenced sources; directly quoting and referencing the material you copy; or properly attributing public domain text that you place directly into an article.

Basically, we seem to be telling editors, "No one really knows what plagiarism is, but we think it is a big problem". We should just stick to what we want editors to do (or not to do), so the guideline gives them a sense of confidence and certainty ("Right ... I see ... this is what I have to do ..."), rather than a sense of uncertainty and doubt ("Gee, this plagiarism thing is kind of involved ... what does it all mean? Does it even matter? People aren't even agreed it is plagiarism. I'm not trying to deceive anyone, I just want to tell people what the sources I've cited say!").

Views? JN466 23:40, 17 June 2009 (UTC)

I think it is extremely important to note that some definitions indicate plagiarism is intentional, while others do not. I have seen people become very heated at the administrators' noticeboard over the question of plagiarism where one person was using the term to allow for inadvertent infringement while the other was "hearing" it as a charge of deliberate deception. We need to acknowledge both that there is a divide here and that we have a position on it. --Moonriddengirl (talk) 00:10, 18 June 2009 (UTC)
This relates to the early sentence "An accusation of plagiarism is very serious," which I don't like. In that assertion, i think it is implied that an accusation of plagiarism involves a judgment of intent and evildoing, because otherwise it would not be so inflammatory. We should model appropriate charges of plagiarism, rather than promote inflamamatory ones. I would rather go with definition of plagiarism within wikipedia in a low-key way, as in, perhaps, "Plagiarism in wikipedia is the presentation of ideas and wordings with less attribution than is reasonably expected." And then clarify somewhere that "In educational and other settings, accusations of plagiarism are very serious, in part as they involve assertions of deliberate intent to deceive and to claim original credit for work by others. In wikipedia, intent is not a factor in determining whether text written provides adequate attribution to its sources. And in wikipedia ideas in articles are not often original (and usually should not be), and it is usually only wording that is implicitly claimed to be original work by the collective of wikipedia editors, so implicit claims of originality are less salient." I don't think we have to comment on whether plagiarism charges in wikipedia are serious or not. Where are there accusations of plagiarism? Perhaps after creating and running a central board to deal with cases of plagiarism, we would accumulate some perspective about them to share in generalizations in this guideline, but I don't know now how to find accusations much less how to characterize what are usual ones. doncram (talk) 09:11, 18 June 2009 (UTC)
Trust me: they're very serious. :) People use the word plagiarism, and other people get mad. (I believe you are interpreting "An accusation of plagiarism is very serious" as "This is an OMG! AWFUL! crime", while I think what is meant in context is "Please don't bandy this lightly." That's why the next sentence says, "When dealing with plagiarism, take care to address the issue calmly and civilly." But if you are interpreting it that way, others will also. We may need to find new language for that.) I can drag up other conversations, but see this, during which conversation an administrator working from the "plagiarism = purposeful" definition indicated that, "So an accusation of plagiarism sticking to someone will effect the way people judge their character far more than any misunderstanding over copyright." (She also said, in separate comment, "Because any accusations being understood as the ...[intentional] kind of plagiarism could do serious harm to someone's reputation. ") Such language is intended to prevent escalation of drama and undue blackening of reputations. (And, by the way, I've just noticed a problem in the lead, which I've changed. When dealing with plagiarism, one doesn't focus on copyright violations. One focuses on plagiarism. I've revised that.) --Moonriddengirl (talk) 11:21, 18 June 2009 (UTC)

I strongly agree with Jayen466's sentiments. The definitions are worse than useless. Not only are they largely inapplicable to Wikipedia (since in some cases it is actually appropriate to copy content into Wikipedia), our analysis of them leads nowhere and basically says "we don't actually know how to define plagiarism in the context of Wikipedia." Just ax the whole section and keep to giving editors specific instructions, not meandering inquiries into the nature of plagiarism. Kaldari (talk) 16:11, 18 June 2009 (UTC)

That's a bit contentious, then, since further up on the page others have protested when the definition was absent. --Moonriddengirl (talk) 16:14, 18 June 2009 (UTC)
At the very least, definitions that are not applicable to Wikipedia should not be included. They only add confusion to an already confusing (and often contradictory) guideline. Kaldari (talk) 16:18, 18 June 2009 (UTC)
I'm not particularly hung up on the definition we have, even though I believe I originally dug it up many months ago. But I think we need some clear definition. Timeline: there were arguments that plagiarism wasn't defined and arguments about what the definition of plagiarism was; a definition was put in; somebody later took it out; somebody else came in arguing that plagiarism wasn't defined. I'm hoping to break this cycle. :) Have you got a good one? --Moonriddengirl (talk) 23:19, 18 June 2009 (UTC)

I think the present text is OK. There is no agreement in the real world about what "plagiarism" means; we cannot resolve that. The present text gives a summary of the situation in practical terms relevant to Wikipedia. — Carl (CBM · talk) 00:25, 19 June 2009 (UTC)

I agree with CBM. There is guidance here for reviewers, i.e. there is no cut-and-dried definition of plagiarism so caution and judgement are required; and there is guidance for writers, i.e. there are ways that you can never even get close to the borderline.
As to doncram's concerns, the "serious charge" bit is reality. You can call someone a copyviolator, call them a troll, call themtheir idea an idiot and it will be a little bumpy, but call them a plagiarist and bang, zoom, right to the moon Alice! In certain circles, that copyrighted phrase might be considered to have become a well-known fair-use colloquialism, but since I didn't quote it, it is probably still plagiarism, since I haven't noted the source of the distinctive phrasing. Perhaps we whould return to a portion of Elonka's original phrasing, "making a charge of plagiarism towards another editor is a serious statement"?[2] As to your preferred definition of plagiarism in the Wikipedia context, I'm generally in agreement, it is "less attribution than is reasonably expected". But the devil is in the details there, "less", "reasonably" and "expected" are all words that need defining themselves. Franamax (talk) 06:34, 19 June 2009 (UTC)

I like the new text, Jayen466. It is unambiguous, to the point, and isn't totally confusing and meandering like the old version. Kaldari (talk) 16:12, 19 June 2009 (UTC)

Thanks. JN466 16:27, 19 June 2009 (UTC)
Very nice job, JN, it's a huge improvement. --Philcha (talk) 17:10, 19 June 2009 (UTC)

Examples of close paraphrasing in an FA

To illustrate some of my concerns, the following examples are from the text and sources of 2007 Samjhauta Express bombings, a Featured Article.

Example 1

Source Article
Witnesses said they saw people screaming and struggling to get out. The injured were pulled out of the burning carriages onto the trackside by fellow passengers, and local residents rushed to help. Witnesses claim to have seen passengers screaming and attempting to escape […] The injured were pulled out of the burning carriages and onto the track by fellow passengers and local residents.
Indian Prime Minister Manmohan Singh, expressing "anguish and grief" at the loss of life, vowed that the culprits would be caught. Prime Minister Manmohan Singh expressed "anguish and grief" at the loss of life, and vowed that the culprits would be caught.
Musharraf called for a full investigation by the Indian authorities Musharraf also said that there must be a full Indian investigation of the attack.
Inside one, an electronic timer encased in clear plastic was packed next to more than a dozen plastic bottles containing a cocktail of fuel oils and chemicals. Inside one of the suitcases containing the undetonated IEDs, a digital timer encased in transparent plastic was packed alongside a dozen plastic bottles containing fuel oils and chemicals.
Officials said about 30 of the bodies were charred beyond recognition. … many of the bodies were charred beyond recognition
The rest of the train, which had been carrying around 600 passengers, continued to the border town of Attari where passengers were transferred to a Pakistani train. The rest of the train, which was left undamaged by the attack, continued on to the border town of Attari, before being transferred to a Pakistani train that took passengers to their destination in Lahore

The letter of our guideline would lay the authors of this FA open to the charge of plagiarism. However, I don't think it would be fair, especially when reading the article as a whole. I do not think that a phrase like "bodies were charred beyond recognition" should be rephrased as "the dead were so badly burnt that they could not be identified" (which actually might be wrong; perhaps they were identified using DNA analysis, etc.), nor do I think that a six-word phrase like "the bodies were charred beyond recognition" should be put in quotation marks. These are all non-creative, factual expressions, remarkable more for the information they convey than for the formulations used to convey it.

Note that in this case, there is likely no great POV dispute in the article that would set editors against each other. But imagine this being an article on Eastern European history, or homeopathy. I can just see editors saying, "What you have just inserted about the "bodies being charred beyond recognition" is plagiarism. I've deleted it." How can we make the difference clear between the acceptable use of the most straightforward way of saying something and this sort of thing, which is egregious plagiarism?

Note also that in the last source/article pair, the article writing is inferior to the source (the "rest of the train" was not "transferred to a Pakistani train", the surviving passengers were). In the first pairing, the article is inaccurate – "onto the trackside" is not the same as "onto the track". JN466 08:27, 18 June 2009 (UTC)

An interesting example, nicely presented, Jayen466! If i am understanding this correctly, it looks like plagiarism to me. It is my sense that the wikipedia article should be a summary, a shorter version than any single source, or that it should be a synthesis of several sources. If there is just one source and the wikipedia article is the same size, and if the wikipedia article does not have extensive explicit quoting, then I do not see how it can be other than plagiarizing. It is better to quote extensively; minor attempts to reword each sentence, but hold to the same idea in each sentence in the same organization, do not add value. How can you add value, in rewording from just one source?
If I were part of a board reviewing this as a suggested case of plagiarism, I would be inclined to believe the article should be put into Featured Article Review to be cleaned up. I would, first, want to check on, and perhaps revise, any guidance on close paraphrasing to make sure it adequately covered the situation. However, the board could comment on an individual example like this even if the central guidance is inadequate; the central guidance can be revised gradually as more cases are systematically covered. doncram (talk) 09:27, 18 June 2009 (UTC)
2007 Samjhauta Express bombings cites 41 sources. I am sure if we started an FAR on the article, based on WP:PLAGIARISM, we would get an interesting debate. JN466 11:26, 18 June 2009 (UTC)
Some of this is rather troubling. The first two examples look to me like too-close paraphrasing. especially the clumsy change of trackside to track (why not at least omit the whole onto the track... bit?); and "anguish and grief" is quoted but not attributed directly, i.e. it seems obviously a copy of another writer's quote. The third (Musharraf) entry is a simple English statement, I see no problem there. #4 I would say is an acceptable paraphrase, but only just - there are only so many ways to express reported facts. "Charred beyond recognition" is a common English phrase and no structure is copied, so again no big problem. The last entry looks OK to me from the plagio standpoint, but would crucially depend on whether the actual train cars were shunted for its factual accuracy.
This is a great illustration of the judgement involved in assessing potential plagiarism. Overall this would not concern me greatly, unless all these examples were inserted by the same editor, who had done the same on several articles; or these examples comprised the great bulk of the entire article. If this were the case, some counselling of the editor might be justified, otherwise, a little effort to rewrite the offending passages would do the job just fine. Franamax (talk) 07:22, 19 June 2009 (UTC)

Example 2

Sources and text are from the FA 2000 Sri Lanka cyclone.

Source Article
At least nine people are dead At least nine people died
Eight fishermen are missing, feared dead. Eight people were left missing and feared dead.
a street protest took place in Trincomalee on December 27 over the lack of aid A street protest occurred in Trincomalee due to lack of aid.
The families of those who died will receive 15,000 rupees ($US183) in compensation and those whose homes have been damaged or destroyed will receive just 10,000 rupees. the families of those who died received $183 [...] in compensation. The government also gave $122 [...] to those whose houses were damaged or destroyed
Ten roofing sheets were distributed to 1,720 families in six Districts ... In addition, 3,000 families were selected to receive one set of cooking utensils each, two bedsheets and two sleeping mats. the Red Cross distributed 10 roofing sheets each to 1,720 families, and also sent a set of cooking utensils, bed sheets, and sleeping mats to 3,000 families.

Is any of this plagiarism? For example, would it make sense to change "set of cooking utensils", "bed sheets" or "sleeping mats" to synonyms, to avoid substantial similarity with the source? Is the re-use of these words in this FA indicative of laziness, an intent to deceive, or a desire for precision? Should "set of cooking utensils", "bed sheets" and "sleeping mats" be put in quotation marks in the article? JN466 11:26, 18 June 2009 (UTC)

Can I ask you to indicate, specifically, your point and the direction you intend? Suppose that editors talking about this here say "Yes, this is plagiarism" or "No, this is not plagiarism," then what? How will this test case be used to alter the guideline? --Moonriddengirl (talk) 11:41, 18 June 2009 (UTC)
In my view, the guideline's comments on close paraphrasing and duplicating source wording are not properly thought through. What I am doing at the moment is research of the type the guideline recommends, trying to find instances of close similarity between source and article wordings in featured articles. If the results of this research show that many, if not all of these instances of close similarity between source and article wordings are defensible and of a type that is commonplace in Wikipedia, even in Wikipedia's featured work, then I hope this will result in a consensus to adjust the guideline accordingly. The guideline should make clear that we are not interested in finding an occasional sentence that is a close paraphrase of a cited source, but are interested in cases like this, where whole paragraphs are copied wholesale and then superficially changed. In other words, we should ensure that the dozen of us here composing a new guideline will not result in a significant proportion of our existing FAs and GAs suddenly, from one day to the next, turning from acclaimed work into reprehensible examples of plagiarism. We don't have a remit for that. JN466 12:29, 18 June 2009 (UTC)
It doesn't look like plagiarism to me, although it's difficult to judge plagiarism from individual sentences taken out of context. Since most of those sentences are simply listing facts, there are only so many different ways that you can word it. No matter how most of those sentences are written they are going to sound similar. Kaldari (talk) 16:15, 18 June 2009 (UTC)
I agree. Can we add a corresponding statement to the Guideline? For example:
It can also be useful to perform a direct comparison between cited sources and text within the article, to see if text has been plagiarized, including too-close paraphrasing of the original. Here it should be borne in mind that an occasional sentence in an article that bears a recognizable similarity to a sentence in a cited source is not generally a cause for concern. Some facts and opinions can only be expressed in so many ways, and still be the same fact or opinion. A plagiarism concern arises when there is evidence of systematic copying of a source's diction, across multiple sentences or paragraphs.
Would something along those lines do? How could it be made better? JN466 17:47, 18 June 2009 (UTC)
I also agree that the examples given here are not plagiarised on their face, and that it is difficult to judge the situation without the overall context. Jayen, I agree wholeheartedly with your reformulation, that's the exact message imho. I would prefer to reformulate it as a "how to avoid" message rather than a "how to identify" - but I think that I may instead just rephrase that into my "test essay" on "How to avoid plagiarism". With appropriate attribution. :)
I would say, put your paragraph in where you think it fits best and see how it goes. I personally think it's a definite improvement. Franamax (talk) 07:38, 19 June 2009 (UTC)
Besides my misgivings that it makes no distinctions between non-free and free text, I also note that on its face it allows users to cherry pick language from multiple sources. Is it only plagiarism if it is systematic taking from a single source? --Moonriddengirl (talk) 12:18, 19 June 2009 (UTC)

General versus specific

Extended content

If these two are to be the first of many specific examples, then, may I suggest that a subpage would be appropriate with a pointer from here? While it's probably a good idea to centralize conversation about the principles, this is but one point of this guideline, and lengthy tables and conversations about specific examples may overwhelm and distract from developing other points.

As to the general, it seems revisiting the legal aspects of this might be useful. There are two factors to consider here. Close paraphrasing of free sources is a plagiarism concern. Whether or not it's allowed is up to consensus. Close paraphrasing of non-free sources is a copyright concern. Whether or not it's allowed is down to policy based on US law. Some of the examples you give above are "fragmented literal similarity", which is what it is called when literal duplication occurs, but copying is not comprehensive. Close paraphrasing may occur even in the absence of such fragments, if the structure of a source is copied but the language completely changed. Yes, you can violate US copyright law and be legally sanctioned without using a single word from your source if you rise to the level of "comprehensive non-literal similarity". The incorporation of literal similarity in such cases simply serves to strengthen the evidence against you, since it is pretty hard to defend against a charge of copying when evidence is clear that you have read the source and copied it.

From a copyright standpoint, the dividing line between how much is too much (when we reach the point that a court says, "This is serious enough for us to care") is not firmly defined by legal code. We don't take chances...not only for our own use, but for that of our contributors. To refer back to WP:C, "If in doubt, write the content yourself, thereby creating a new copyrighted work which can be included in Wikipedia without trouble." (Close paraphrasing is derivative work, which is allowed only by the copyright holder.)

There are a good many cases illustrating these copyright issues in action, but I'm going to quote a bit from Salinger v. Random House (we really need an article on that), since it seems particularly relevant to some of your points above. In that case, the court characterized the problem succinctly, noting that facts are not copyrighted but that "'vividness of description' is precisely an attribute of the author's expression that he is entitled to protect....The copier is not at liberty to avoid 'pedestrian' reportage by appropriating his subject's literary devices." The court also noted that "Though a cliche or an 'ordinary' word-combination by itself will frequently fail to demonstrate even the minimum level of creativity necessary for copyright protection..., such protection is available for the 'association, presentation, and combination of the ideas and thought which go to make up the [author's] literary composition.'...as we have more recently stated, 'What is protected is the manner of expression, the author's analysis or interpretation of events, the way he structures his material and marshals facts, his choice of words and the emphasis he gives to particular developments.'...The 'ordinary' phrase may enjoy no protection as such, but its use in a sequence of expressive words does not cause the entire passage to lose protection. And though the "ordinary" phrase may be quoted without fear of infringement, a copier may not quote or paraphrase the sequence of creative expression that includes such a phrase." (citations omitted; http://www.law.cornell.edu/copyright/cases/811_F2d_90.htm)

Again, determining when such has risen to a legally actionable level is very complex. Courts consider many factors in determining if the fragmented similarity meets "fair use." Wikipedia has deliberately chosen to follow a more strict standard than fair use in order to make our content as reusable as possible.

Hence, this guideline is not going to make close paraphrasing of copyright protected materials suddenly problematic, because close paraphrasing of copyright protected materials is already problematic. If you wish to pursue refining the application of the concept of close paraphrasing to free materials, please be sure to separate that out from non-free materials. If you do wish to consider its application to copyright protected materials, please remember that close paraphrasing reflects far more than occasional duplication of language. It also refers to lifting the structure of the material and the perspective and emphasis to facts/details/events. --Moonriddengirl (talk) 13:12, 18 June 2009 (UTC)

Thanks. I would say though that if close paraphrasing of copyrighted sources is already covered by WP:C, then we should expressly exclude it from our discussions in this guideline, and restrict our considerations to the use of close paraphrasing when processing free sources. This topic is tricky enough that editors should be able to find out what to do and what not to do in one place, rather than have partially consonant and partially disconsonant instructions in several places.
Just for reference: What would you say are the most relevant parts of our copyright policy pages dealing with close paraphrasing? I know one or two of them, but you probably have a much better overview, and I am always amazed at how I sometimes find a policy or guideline page somewhere that I have never heard of and can't recall anyone mentioning. :( JN466 18:01, 18 June 2009 (UTC)
I'm sorry for taking up the space and will collapse this again, because there's just no way to answer "short." :/ Close paraphrasing is barely addressed in policy. It is lightly mentioned in WP:C (a long time ago, it actually pointed to our article on "plagiarism"), with:

Note that copyright law governs the creative expression of ideas, not the ideas or information themselves. Therefore, it is legal to read an encyclopedia article or other work, reformulate the concepts in your own words, and submit it to Wikipedia, so long as you do not follow the source too closely. (See our Copyright FAQ for more on how much reformulation may be necessary as well as the distinction between summary and abridgment.) However, it would still be unethical (but not illegal) to do so without citing the original as a reference.

Our Copyright FAQ adds to that

Facts cannot be copyrighted. It is legal to read an encyclopedia article or other work, reformulate the concepts in your own words, and submit it to Wikipedia, although the structure, presentation, and phrasing of the information should be your own original creation. The United States Court of Appeals noted in Feist Publications v. Rural Telephone Service that factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers."[3] You can use the facts, but unless they are presented without creativity (such as an alphabetical phone directory), you may need to reorganize as well as restate them to avoid substantial similarity infringement. It can be helpful in this respect to utilize multiple sources, which can provide a greater selection of facts from which to draw. (With respect to paraphrasing works of fiction, see derivative works section below).

The derivative works section says:

A derivative work is something that is "based on and a close copy of" another work. For example, the first Star Wars novelization is a derivative works of Star Wars Episode IV: A New Hope. Therefore, Del Rey Books required Lucasfilm's permission to publish and distribute the book.

You may not distribute a derivative work without the original author's permission unless you're using one of the rights they weren't granted (like fair use or fair dealing). Generally, a summary (or analysis) of something is not a derivative work, unless it reproduces the original in great detail, at which point it becomes an abridgment and not a summary.

(Of course, one can abridge non-fiction as well as fictional works.)
We don't really have a "guideline" for copyright. Given that such a guideline might well seem to be "legal advice", that's probably best.
I'm all for keeping copyright and plagiarism separate. As I said at the recently closed MfD, the definition of plagiarism that we wind up using, whatever it is, is open to community consensus. Copyright is not always clear—we sometimes have to call on the community lawyer or specialists in various areas to help determine copyright status of something—but the community doesn't have the option to decide, say, to ignore copyright law. :) It's less a matter of consensus and more a matter of being sure that we are complying with legal and/or foundation mandate.
We have to be careful, though, not to inadvertently suggest that we are addressing copyright (the legal) in discussing plagiarism (the ethical), because many people do use the terms interchangeably. I try to emphasize the correct term when I run into that, but I've seen people flip from one to the other within the same conversation.
But as we've discussed before, this guideline also barely discusses close paraphrasing. It's mostly discussed at the user-essay Wikipedia:Close paraphrasing, which already takes into account that "Some facts and opinions can only be expressed in so many ways, and still be the same fact or opinion." --Moonriddengirl (talk) 18:44, 18 June 2009 (UTC)
(e/c) Looking at the Salinger case that you linked, I must say though that the situation in that case differs sharply from ours in a number of material ways.
  • For one, the letters Hamilton cited or paraphrased closely were unpublished (whereas our sources are by definition published sources). Reproducing these letters' content, even in close paraphrase prefaced by "he said" or "he wrote", was deemed to "diminish interest in purchasing the originals" should Salinger ever decide to publish them. The whole argument went like this (my emphases):

    Concluding as we do that substantial portions of the letters have been copied, we do not share the District Judge's view that marketability of the [as yet unpublished] letters will be totally unimpaired. To be sure, the book would not displace the market for the letters. Indeed, we think it likely that most of the potential purchasers of a collection of the letters would not be dissuaded by publication of the biography. Yet some impairment of the market seems likely. The biography copies virtually all of the most interesting passages of the letters, including several highly expressive insights about writing and literary criticism. Perhaps few readers of the biography would refrain from purchasing a published collection of the letters if they appreciated how inadequately Hamilton's paraphrasing has rendered Salinger's chosen form of expression.[n5] The difficulty, however, is that some readers of the book will gain the impression that they are learning from Hamilton what Salinger has written. Hamilton frequently laces his paraphrasing with phrases such as "he wrote," "said Salinger," "he speaks of," "Salinger declares," "he says," and "he said."[n6] For at least some appreciable number of persons, these phrases will convey the impression that they have read Salinger's words, perhaps not quoted verbatim, but paraphrased so closely as to diminish interest in purchasing the originals.

  • We are citing media articles published several years ago (or journals and books). The media articles on the train attack are actually freely available on line, at the BBC website and so on. Many journals and books can be viewed in google books, etc. Even a book, journal or newspaper article that is only available against payment is a very different kettle of fish to the as yet unpublished letters of a major writer. Once a book is published, others are free to comment upon it and cite it, under fair use.
  • The judge noted the "more limited scope" of fair use "with respect to unpublished works" which weighed against Hamilton, and in Salinger's favour. Also weighing against Hamilton was "the extensive amount of expressive material Hamilton ha[d] copied" (my emphasis).
  • The judge further noted that preventing Hamilton from paraphrasing the expressive content of Salinger's letters without Salinger's consent would not "interfere in any significant way with the process of enhancing public knowledge of history or contemporary events" – that is obviously a consideration, and a very relevant one in our context – adding that Hamilton was free to report the facts mentioned in the letters, even against Salinger's expressed wish. Salinger only had "a right to protect the expressive content of his unpublished writings for the term of his copyright, and that right prevail[ed] over a claim of fair use under "ordinary circumstances" (my emphasis).
All in all, publishing close paraphrases of large "expressive content" sections of a creative writer's unpublished letters against his will (he did not want them published at all) is a fundamentally different situation to reporting what a newspaper has said about a train disaster, or reporting the comments of a published scholar. In the case of the online newspaper article, we are actually directing traffic to their site, increasing their revenue, and when quoting a scholar, we are again more likely to increase the market for that book than we are to reduce it. At any rate, speaking for myself, ever since I started working here the number of books I've actually bought has increased substantially! Best, JN466 18:58, 18 June 2009 (UTC)
Wow. 175 kilobytes and counting. :) Okay. That all circumstances in this case are not the same doesn't mean that none are. It's true that unpublished works are more tightly protected than published, but copyright protection does not disappear because a media outlet chooses to publish their material on line or that it's available through search engines. Similarly, you can't hear a song on the radio and appropriate a beat from it because it's been released on the airwaves, any more than you can videotape a scene from a film at the theater and put it up on Youtube. I explained above that in evaluating these cases, many factors of fair use are considered. But the essential principles are the same. Let's look to another source:

...in actual practice, the courts will not permit paraphrasing where they would not permit use of the paraphrased material itself. Protected writing has been held to be infringed by paraphrase that remains sufficiently close that, in spite of changes, it appropriates the craft of authorship of the original (Section 13-40; Perle & Williams on publishing law, 3rd edition. 2004 Supplement. Perle, Fischer & Williams. Aspen Publishers. ISBN 9780735504486.)

You might also find this interesting reading. --Moonriddengirl (talk) 19:19, 18 June 2009 (UTC)
While noting that we are talking about copyright issues, rather than plagiarism, and thus aren't really required to solve these problems on this page :P – seriously, wouldn't it make more sense for the Foundation to ask a team of lawyers to look into all of this? JN466 19:30, 18 June 2009 (UTC)
Note, too, that page 13-40 of Perle et al. says,

"These cases do not mean that one may not paraphrase. They do, however, mean that one cannot successfully assert fair use as a defense against "close" paraphrasing where use of the copyrighted words themselves would not be deemed fair use".

I read that to mean,

"If you could get away with quoting the source verbatim, you can also get away with a close paraphrase".

In other words, using a direct quotation, marked as such, offers no advantage over a close paraphrase. It's either fair use, or it isn't; whether you paraphrase or quote verbatim makes no difference. No? JN466 19:39, 18 June 2009 (UTC)
I'm not sure what problem we're trying to solve. :) My point is that there are two separate considerations here: close paraphrasing as plagiarism; close paraphrasing as copyright infringement. Any language that you wish to include here should clarify that you are addressing the matter of plagiarism only. If you want to argue it from a copyright perspective, it belongs elsewhere. (Actually, as I seem to recall suggesting some months ago, I think this whole conversation belongs at Wikipedia talk:Close paraphrasing. That essay could use some work. This guideline is long and complicated enough, and there has never been any consensus that I've seen to expand coverage of this issue here.) As far as your question is concerned, as long as we're clear that "get away with" ("successfully assert fair use as a defense") means "have a court of law find you not guilty", then I agree with your reading of the law. The court will not find you guilty of copyright infringement if it finds your use "fair", just as it will not find you guilty of copyright infringement if your taking is small. Wikipedia's policy, nevertheless, is "There is no automatic entitlement to use non-free content in an article. Articles may in accordance with the [NFC] guideline use brief verbatim textual excerpts from copyrighted media, properly attributed or cited to its original source or author." So, on Wikipedia, whether you paraphrase closely or quote verbatim does make a difference. --Moonriddengirl (talk) 20:19, 18 June 2009 (UTC)

Arbitrary break: Discussion about verbatim quotes and close paraphrasing

Extended content

:::::As far as I can see, WP:NFCC doesn't say that you can't use a close paraphrase instead of a verbatim quote. By logical extension of the fair use principles, if you can use a verbatim quote, you can also use a close paraphrase; if properly cited, it's if anything a lesser taking. Hence it makes little sense to me to recommend verbatim quotes and decry close paraphrases.

So the problems I am trying to solve are (1) that this guideline in its present state implies that a marked verbatim quote is a lesser taking than a close paraphrase; the passage concerned is:

You can always avoid any dispute by one of: rewriting text completely into your own words, using multiple referenced sources; directly quoting and referencing the material you copy;

and (2) that it is instructing editors to look for close matches between article and source wording and implies that any such close match is plagiarism. This is the wording I am concerned about here:

It can also be useful to do a direct comparison between cited sources and text within the article, to see if text has been plagiarized, including too-close paraphrasing of the original.

Editors might as well look for direct quotations. In both cases, it is about how much you quote, or paraphrase. Just like the three-paragraph close paraphrase on that fish was plagiarism, so it would be plagiarism if you inserted the three paragraphs as a marked verbatim (assuming the source is cited in either case).
But I am feeling a little breathless ... it would be nice for another editor to comment. JN466 22:11, 18 June 2009 (UTC) These concerns are now resolved/addressed. JN466 17:06, 19 June 2009 (UTC)

WP:C says, "There are some circumstances under which copyrighted works may be legally utilized without permission; see Wikipedia:Non-free content for specific details on when and how to utilize such material." I've already quoted what NFC says. Multiple policies and guidelines note that Wikipedia has deliberately chosen a more narrow road than "fair use." Verbatim quotations are permitted by policy. If you're suggesting that proper citation of close paraphrase makes a "lesser" taking, then I believe you may be misunderstanding the courts' position, but I'd be interested in seeing support for that.

The operative words in your quote are "too-close paraphrasing." Wikipedia:Close paraphrasing is incorporated by reference in the "see also" section at the bottom. It does address situations where little room for originality in language exists. --Moonriddengirl (talk) 22:23, 18 June 2009 (UTC)

In the Hamilton/Salinger case, Salinger prevented Hamilton from quoting his unpublished letters at length. Hamilton thought he could address the problem by using close paraphrases of the letters instead (lesser taking). The first judge agreed with Hamilton (indicating that if anything, a close paraphrase is a lesser taking). The appeal judge did not agree with Hamilton and the first judge, and said close paraphrasing was as bad as verbatim use in this case (not "worse" than verbatim use).
It is an overstatement to say that verbatim quotations are permitted by policy. WP:NFC explicitly says, "Extensive quotation of copyrighted text is prohibited."
In both instances, close paraphrase or verbatim, it is only about the amount you copy. It is not about whether you paraphrase or quote. As Perle et al. point out, the fallacy is to think that close paraphrase allows you to copy "more" of a source than verbatim copying. It would be an equal fallacy to think that using verbatim quotes of a source allows you to copy more than you are allowed to copy in close paraphrase.
We must not give editors the idea that they can copy as much as they like from a cited source as long as they put quotation marks around it. The restrictions as to amount are the same, whether you use close paraphrase or verbatim. I hope this has made the point at issue clear. JN466 08:34, 19 June 2009 (UTC)
Switched the order of Jayen and my posts, Jayen's is more directly apropos to the extant thread and better illustrates my interjection about "why are you discussing court cases here?"Franamax (talk) 09:26, 19 June 2009 (UTC)
Oohh me brain hurts. Did everyone remember to take off their copyright shoes when they entered this thread and just walk in their plagiarism socks? The essence to me is proper attribution. So depending on the copyright status and use of quotes around the copied text:
  • Strict copyright/within quotations - strictly governed by NFCC, specific attribution required, paraphrasing disallowed
  • Strict copyright/unquoted - strictly governed by WP:C, including close paraphrasing. Some interesting recent examples exist as to whether attribution via footnote or external link is acceptable cover for direct copying of copyrighted text, including relatively straightforward recounting of biographical facts.
  • Liberal copyright (GFDL/CC-BY) and "free"/within quotations - unlimited use, specific attribution required, paraphrasing disallowed
  • Liberal copyright (GFDL/CC-BY) and "free"/unquoted - here's another tricky one. Generally unlimited use provided there is "appropriate" attribution. But can I freely intersperse my own writing into a "free taking" and just generally attribute it? Or do I have an obligation to indicate exactly which writing is not my own?
MRG has noted some nuances of copyvio vs. plagio above (well, far above now :) where portions of not-copyvio can still be is-plagio, which would likely fall into entry number 2 of my matrix above. Franamax (talk) 08:20, 19 June 2009 (UTC)
As for interspersing my own writing into a "free taking", I am in favour of having each possibly contested sentence cited to its source. So sentences based on the free source should be cited to the free source, sentences added should be cited to the other source, if the content is such that it needs citing. Anything else makes verification a nightmare. JN466 11:35, 19 June 2009 (UTC)
Sure, I agree with that when you are using multiple sources and/or adequately rewriting. If you are just copy/pasting sentences verbatim or with minimal rewriting, even if you put a cite template on each sentence, I think that's wrong. You're still not indicating who was the author of those words.
My main point through all of this is that you should block-copy in the "free taking" in one identifiable edit, then start modifying from there. If you do that, verifiability is a snap: here is the before, here is the after. The original author has their work fully credited, and the editing process can begin. This relieves the wiki-editor from the need for adequate paraphrasing when making the initial insertion. Just like your last edit to this page is sufficient attribution for me to modify the page further, my (properly attributed) insertion of free text to the page is a license for you to further modify anything on this page.
Let's put it another way: if I find a good free source on the mating habits of 67 different species of salamander or typical diseases affecting 450 tree species in different ways - I'm not gonna wait for your opinions on proper rewriting and sourcing of sentences. I'll put the free knowledge in each article. I'll check them all to be sure I'm adding something new, I'll be sure to make clear where I got the info, but I will add the knowledge. The details of cleaning up and sentence-wise sourcing can come later. The information comes first. Franamax (talk) 13:04, 19 June 2009 (UTC)
"I hope this has made the point at issue clear." No, I'm sorry, but I have long since lost any sense of what it is we're trying to discuss. My copyright shoes are on, Franamax, only to the extent that I want any changes to the material on close paraphrasing within this guideline to make clear that there are distinctions between free and non-free text and their handling. Jayen, Wikipedia requires that text that is not free be noted. This is set out at WP:C; Wikipedia's license allows liberal reuse, even commercial, and we are not here to trick reusers into violating the copyright laws of the US or other countries by interspersing unmarked protected material into our own text...regardless of whether the US courts regard it as equally bad to overuse quotations or whether we are just as likely to "get away" with it. Properly denoted quotations can easily be removed so that only free-licensed material will be duplicated. Interspersed non-free material can't be so easily identified and excised. If you're worried that the sentence "directly quoting and referencing the material you copy" is going to cause text dumps in blockquote, that can easily be fixed by adding a footnote: "But, note, the amount of text you quote from non-free sources must be limited to comply with non-free content guidelines." Voila. --Moonriddengirl (talk) 12:13, 19 June 2009 (UTC)
So you are saying it is better for reusers if we use a marked verbatim rather than a close paraphrase, because reusers can then recognize a verbatim and delete it before they reproduce our texts for, say, a commercial purpose, whereas a close paraphrase that might have been fair use in our article might not be fair use in their product, and they might then be found guilty of an infringement because they left it in, without attributing it? JN466 12:57, 19 June 2009 (UTC)
Edit conflict I originally had written, "Yes. That's nicely succinct. Wikipedia does that "deliberately more narrow than fair use" thing with reusers in mind. Am I missing anything? We've gotten long and tangential, and I may still be lost. :)" Your addition of "without attributing it" complicates my response, though. They may be found guilty of an infringement because they leave it in whether it is attributed or not. In your example above, for instance, usage of material may be reasonable in scale for non-commercial, educational purposes but not for a commercial product. To quote the feds, "Acknowledging the source of the copyrighted material does not substitute for obtaining permission." --Moonriddengirl (talk) 13:16, 19 June 2009 (UTC)
Okay, imagine that "without attributing it" gone again. :) Thank you very much for your (much-tried) patience; I have gotten the point now why extended close paraphrases are a worse idea in Wikipedia than verbatim quotes of the same length. JN466 13:26, 19 June 2009 (UTC)
I was just coming over here to point out that this conversation is playing out in a sense at ANI at this very moment: [4]. :) (My patience is less an issue than my attention span. I can follow tangents for miles and completely lose the original point, which is why I occasionally try to find my way back. If you feel that I have lost the sense of your argument, please just nail it down like you did here. That's helpful for me.) BTW, any objections to my collapsing at least the first section above? It is rather sprawling. --Moonriddengirl (talk) 13:32, 19 June 2009 (UTC)
Yep, interesting thread in light of present discussions. Yes, feel free to collapse (the thread). JN466 14:33, 19 June 2009 (UTC)

The lead second paragraph

The lead should sum up what is said in the body of the page and be stand alone. As such I have a problem with this second paragraph:

Plagiarism is the incorporation of someone else's work without providing adequate credit. Even if you have cited a source, make sure that your wording does not duplicate that of the source unless you note duplication by quotation marks or some other acceptable method (such as block quotations).[1] This applies even if your source is not copyrighted.

I think it confuses plagiarism with copyright. It is not just the wording that is plagiarism, but also claiming that someone else's idea is one's own (a big sin in academia).

However I am not so hung up of the first sentence. Where I have a problem is the second and third sentence as they imply that copying 1911EB is unacceptable although many many articles include 1911EB text, and other copyright expired sources. If all that text was to be placed in quotations then it could not be edited to update the style and information content within the quotes. The whole point of including chunks from 1911EB and similar is to put in place a seedbed of information on a topic that through the usual Wikipedia process can gradually be altered into a completely new and useful work, with some parts trimmed out and others added and the sentences altered so that they read as a contemporary work. Given 20 years or so the text will probably look nothing like the initial 1911EB text, but if it is in quotes this can never happen. If of course the original author of a copyright expired text makes a statement that is a point of view about something there is no reason why that specific point of view should not be included as a quote or otherwise attributed to the source -- in the usual way that is done for such text under copyright -- but there is no reason why the text in general should not be incorporated into the Wikipedia article. If text is copied verbatim from a PD source I think it is a good idea to include one of the templates in Category:Attribution templates, but I do not think it is essential.

So I think this second paragraph needs to be broken into two and expanded, so that there is an explanation of how text that is in copyright should be presented within an article and another paragraph on what to do if the text is copyright expired. This would roughly speaking cover the differences mentioned in detail in the sections of the page. --PBS (talk) 13:59, 18 June 2009 (UTC)

I don't get where people get the idea that "If all that text was to be placed in quotations then it could not be edited to update the style and information content within the quotes". This is a ridiculous fallacy, to my view. Of course the quotations can be edited and removed from quotation marks when the wording is changed! However has been repeated so often in discussions, often used as an argument against requiring proper attribution, that I tend to think it must be addressed explicitly in the guideline. For example: "Editors sometimes have believed that putting quote marks around a passage copied verbatim from a text prevents further editing to update the style and information content. This is entirely false, because the material can be reworded and revised, at which time it is appropriate to remove the quotation marks of course." doncram (talk) 18:52, 18 June 2009 (UTC)
The wording of the second sentence needs to be changed. In some situations it is acceptable to leave the wording identical to the source without using quotation marks. Examples include EB1911 imports, the recent optics import from Wikisource, translations from other wikis, etc. Kaldari (talk) 21:01, 18 June 2009 (UTC)
I haven't really been involved in this part of the guideline, since I don't really have a strong opinion on how free licensed or public domain material is denoted, but Kaldari raises a good point about transwiki in particular. We don't denote text copied from other Wikipedia articles or other Wikipedia projects. (Except, of course, to the attribution extent necessary for our licensing, which is generally not obvious to readers.) --Moonriddengirl (talk) 23:32, 18 June 2009 (UTC)
I disagree that it is acceptable to copy in text verbatim from EB1911 without quotation marks. It was done, yes, years ago. And since then there have been many people working to clean up the difficulties that created. I believe there is a consensus among a small set of people doing the cleanup that it was unfortunate how the EB1911 text was copied in (without enough specific attribution, as some of these persons commented in previous RfC discussions linked in our Talk archive here). And, also, there is copying in happening from other GFDL sources that is going to cause prodigious amounts of future work to clean up, which could be avoided if proper attribution is used. People erroneously think that it suffices to create an article by pasting in a big verbatim text, and slapping on an attribution template. At the first moment, it is attributed okay. It all came from the one source mentioned anywhere in the article. But it leads immediately to bad situations, when the first editors start adding in any other text. It immediately becomes unclear what wording and ideas came from where. Even if you argued that the EB1911 source material doesn't need to be quoted when mixed with other material (which I would disagree with), this sets a bad model in the article that editors will follow with text from other sources where you do agree it should be put in quotes. This would be avoided if the EB1911 text were brought in with blockquotes, and editors worked by breaking it up into smaller blockquotes as they inserted differently sourced material and by removing quotes as they reworded the material. It is not difficult to work this way, it is easy to work this way. In this area, I think the wikipedia guidelines for paste-in, if there are any, are inadequate and tend to cause massive plagiarism (situations of inadequate attribution) down the line. It should not be policy for wikipedia to set up situations of massive future plagiarism. doncram (talk) 23:46, 18 June 2009 (UTC)
It has been and is still standard practice to incorporate other free content into our articles, provided that the copyright licenses are compatible, without putting it in direct quotes, giving attribution at the bottom of the article. There is no need to "clean up" such articles. I realize you believe this is improper, but it must be clear by now that our practice supports it.
Just recently I created {{citizendium}} in response to another editor who realized he can take advantage of our newfound license compatibility. Citizendium explicitly chose the license they did with the knowledge that it would enable us to use their text; some people there argued they should choose a different license to block that reuse, but their argument did not prevail. To call this reuse "plagiarism" would simply be inaccurate. — Carl (CBM · talk) 00:11, 19 June 2009 (UTC)
I think you and i have chatted about this exact subject before, and we can agree to disagree. I believe you have a legitimate opinion although I disagree with it. That template is not as bad as some others, in that up front it includes explicit link, at least in how it is applied in Apple cider vinegar article, one of few yet in Category:Wikipedia articles incorporating text from Citizendium. I'm not going to look at the apple cider vinegar article closely, but I believe it is probably now plagiarized (less well attributed than it should be, relative to what is reasonable expected). In your view the expectation for attribution is different so in your view it is not plagiarized. But I think it is unfortunate you are setting up and encouraging massive amounts of future plagiarism and/or future work to clean it up, by aiding and abetting there. I believe that we should be working to make it easier to develop the wikipedia to have feature-quality content. It is crazy, in the view of others who work on featured articles a lot (not me, i mostly do not work on FAs and have none under my belt) to jumble up material from different sources. I believe it is mostly now practice in FA review to require elimination of vague attribution templates. doncram (talk) 01:02, 19 June 2009 (UTC)
The template cannot be removed after rewriting, any more than an editor can be removed from the history page if their contributions to the article have been replaced or rewritten. The point of the attribution template is to indicate that the authors of the CZ article are now also authors of the WP article. — Carl (CBM · talk) 01:08, 19 June 2009 (UTC)
I hope you are incorrect about that. I would believe that you could quote from there and give explicit credit of footnote + quotation marks, or you could reword and keep the footnote as a source. So I believe you can drop the template and put in a footnote instead. I can't believe it is specified in citizendium - wikipedia agreement that we must include that template, in that exact format, with that size font, etc., like movie actors specify in how their credits will appear in their films. Also, by the way, i meant to say that in cases where paste-in is done with blockquotes as I prefer but it is hoped that editors will reword and otherwise update the text, it would make sense to have a template to give brief guidance that way, with link to more guidance, like an {{underconstruction}} template or other template. doncram (talk) 02:15, 19 June 2009 (UTC)
If these templates were like some sort of vague footnote, then replacing them like that would make sense. But that is not their purpose at all; they are not meant to say "one source for this article is that article". Instead, templates like {{citizendium}} say, "in addition to the authors listed on the history page, all the authors listed over there are also authors of this article." This is why there is no plagiarism – because the authors who you argue are being plagiarized are authors of our article, and they cannot plagiarize themselves. This collaborative aspect of free content differs radically and fundamentally from academic writing. The text that is copied in is not a source for the article, it is the article just like any text written by a WP editor. This is why removing the template is the same as removing editors from the history page here. — Carl (CBM · talk) 02:31, 19 June 2009 (UTC)
I am not at all confused about here vs. academia, which is different. I think of our writing here as that of a collective of wikipedia editors, and we don't need to quote ourselves when we move material from a different article which we ourselves wrote, and maybe not trans-wikipedia. But, I believe that i can rewrite an article that has Citizendium material to only quote from Citizendium and to cite it, and to drop the template, if I take care to compare the source vs. the current article content and ensure no phrasing from the citizendium source remains. To be able to remove it, like to be able to remove a {{DANFS}} template from an article on a U.S. military ship when it goes to FA, I need to have access to the original material, so as to be able to compare and ensure i have removed all (or quoted it).
The current documentation language at the citizendium template is: "When text is copied from Citizendium to Wikipedia, this template can be used to acknowledge it. This template should not be used when Citizendium is cited as a reference, but only when the text from Citizendium becomes part of the article in the same way as text originally written by Wikipedia editors." That is vague and I don't like it, although perhaps that is because I don't like something unfamiliar creeping in, especially when you assert we can never remove it. What if we entirely delete the article and start over on a brand new page? What if we revert to the wikipedia version before any citizendium material was added, in which case it is in the history that citizendium material was once in the article, but the template was removed upon a later date when it was ensured no citizendium material remained?
I am concerned that this template might not be done right, or the instructions that go with it are not complete enough and that it is going to cause problems. The template in the Apple cider vinegar, an article started in 2006, displays "This article incorporates text from the Citizendium article "Vinegar" (retrieved on 2008-04-14), which has been licensed under the GNU Free Documentation License." However, the article shows no edit on 4/14/2008, and there is no record in the article history within wikipedia what was the citizendium text that was added. So is the idea that we will count on citizendium to keep its own 4/14/2008 version available on-line forever? I also really don't like that the addition of the citizendium material is not highlighted in the article edit history. Perhaps what the template displays should also include the wikipedia date when citizendium material was added, as well as the date that the citizendium material was written. And the documentation should give some instructions along these lines, too.
But adding the template and claiming vaguely that some or all of the content comes from somewhere else is indeed mixing it all up and making it difficult later to ensure citizendium content has been removed. This seems to me like a virus infecting wikipedia. I am not on board about introducing citizendium material in this way. An alternative would be to treat it like how I think public domain text should also be handled: bringing it in only in blockquotes and removing quotes only as it is reworded. If this cannot be done for some reason i don't understand, then i might not want to accept the citizendium material at all. Has the importation of citizendium material and these believed-by-some-to-be-permanent restrictions been discussed thoroughly somewhere in wikipedia? doncram (talk) 07:12, 19 June 2009 (UTC)
doncram whatever your vies on citizendium material my point was about copyright expired material like 1911EB. You say "Of course the quotations can be edited and removed from quotation marks when the wording is changed!" but that is not standard practice (see WP:MOSQUOTE) and the removal of quotes would often lead to accusations of plagiarism if it were done to copyright material.
If we were to follow your ideas for copyright expired works, suppose one capital inside a quote is changed, for example from "the earl of Newcastle" to "the Earl of Newcastle" (the former is standard 1911EB the latter standard Wikipedia), do the quotation marks go? If not what about if the name is added "William Cavendish, the Earl of Newcastle" it seems very subjective issue, and conflicts directly with WP:MOSQUOTE.
I don't see a problem with copying paragraphs from 1911EB (citing the paragraphs) and adding a {{1911}} in the references see for example roundhead (weapon) ([EB ROUNDHEAD]), because as I said "through the usual Wikipedia process can gradually be altered into a completely new and useful work". What exactly are the problems that you hint at in "And since then there have been many people working to clean up the difficulties that created. I believe there is a consensus among a small set of people doing the cleanup ...", As I understand it, it is not a small number of editors doing a clean up, as any page with 1911EB in it is edited in just the same way as any other.
But most importantly you have not addressed the central issue, as the second paragraph stands at the moment it does not reflect the body of the page, instead it reflects a very narrow explanation for what is basically the practice for copying information from copyrighted material, not copyright expired or other forms of copyright such as incorporating GFDL from other GFDL projects such as Wikinfo. --PBS (talk) 10:38, 19 June 2009 (UTC)


(undent) To PBS, no, the 1911EB template and all the other attribution templates are explicitly included in the "other acceptable method"s mentioned. This is explained farther down within the article. The exact method by which unquoted copying of free content is to be accomplished is a matter of some dispute: should it be by attribution template, edit summary, talk page note, all of these? Should the specific edit where the free text was incorporated be noted in any of these? And how does this square with our recent license change, where our own edit window now says that "You agree to be credited, at minimum through a hyperlink or URL, when the page you are contributing to is reused in any form" - but "page" is not properly defined?
To doncram, while I respect your idealism in wishing all articles to be FA'able and only written by the collective of editors who form Wikipedia itself, that simply will never be the case. The encyclopedia is open to contributions from everyone, including writers who have never visited this site - just so long as we say who they are and do not violate their rights. Import of free content in a form where it can be relentlessly improved has been, and always will be, a part of this encyclopedia. Regardless of what you or I might wish, it is reality. Our task here is to deal with that reality. All we can do is seek to describe the terms under which the reality unfolds. Franamax (talk) 08:45, 19 June 2009 (UTC)
"no, the 1911EB template and all the other attribution templates are explicitly included in the "other acceptable method"s mentioned." is misleading because of the parenthesised comment that follows it. By what you are saying we can shorten the sentence to "Even if you have cited a source, make sure that your wording does not duplicate that of the source unless an acceptable attribution method is used." But the point of the introduction is to explain briefly what is in the rest of the page, and although that covers it briefly I think it is too brief.
As to your second comment I think you should refer to "Text from external sources may attach additional attribution requirements to the work, which we will strive to indicate clearly to you. For example, a page may have a banner or other notation indicating that some or all of its content was originally published somewhere else. Where such notations are visible in the page itself, they should generally be preserved by re-users." Copyright expired text does not "attach additional attribution requirements to the work". Personally I always include an attribution, if I copy significant amounts of text from a copyright expired source, but AFAICT (but I am not a contract solicitor) there is no legal requirement to do so. --PBS (talk) 10:38, 19 June 2009 (UTC)
Well, my No, the 1911EB template... comment above is certainly not misleading. I can assure you of that, since day one of the genesis of this guideline, it has been a core feature. The extant wording in the lede may indeed be misleading since it omits mention of proper use of attribution templates as an alternative for incorporation of editable free text.
Feel free to propose an alternative somewhat more discoursive than your admirably brief and correct summary above. The lede must very correctly distinguish between copyvio and plagio; quoting, blockquoting and direct copying; plagio even if it's not a technical copyvio; copyright vs. GFDL/CC vs. Govt vs. copyexpired; inline-cite referring to the copy vs. explicit attribution of copying of text vs. template attribution that some text was copied somewhere, sometime.
Give it a shot, the lede wording definitely needs to be improved. I may try it myself in the next day or two. It would be nice to see someone else step onto this endless shooting-gallery though.
On the second point, we rarely discuss actual "legal" requirements at en:wiki, we rather discuss the requirements internal to en:wiki. As I've suggested before (possibly elsewhere), if you don't feel that concepts of plagiarism apply on this site, gather consensus to thoroughly refute the notion. WP:VPR is probably your best first stop. Franamax (talk) 11:43, 19 June 2009 (UTC)
Much of what is described as plagiarism here is covered by Wikipedia:Copyright violation policy. The OED's definition is "1. The action or practice of taking someone else's work, idea, etc., and passing it off as one's own; literary theft." & "2. A particular idea, piece of writing, design, etc., which has been plagiarized; an act or product of plagiary.", and the major motive for that is undermined by WP:OR. So what we are left with is the border area where paraphrasing moves from plagiarism into new text. This is a relatively new guideline (it is only a year since it was first formulated, and I am not sure that it does have widespread community support (for example I've only just come across it so there must be many more who have not). However I think my comments are straying away from what I was trying to address in this section which is that the second paragraph is not an accurate summary of the page. --PBS (talk) 15:20, 19 June 2009 (UTC)

Re Doncram (07:12, 19 June 2009): We have been importing free content for years; see Category:Attribution templates. EB1911 was just one largish project, there are others as well. So any preliminary discussion of this would have happened very early in the project's history. But it's possible that other people, like me, simply view this as a natural part of the free content philosophy.

I was hoping that my post here might spark some discussion. I want to ensure that our attribution system meets CZ's requirements, and gives credit to all authors of an article, whether they are WP editors or not. And I am very willing to discuss how to do that. But I do not want to change our current practice, which would allow us, for example, to take a CZ article on a topic we do not have and copy that entire article to WP without changing a single character, apart from adding attribution that our text came from CZ. — Carl (CBM · talk) 13:14, 19 June 2009 (UTC)

Re Philip Baird Shearer (13:59, 18 June 2009): I agree that the second paragraph is worded in a way that contradicts our usual practices for incorporating free content, and should be changed. — Carl (CBM · talk) 13:29, 19 June 2009 (UTC)

I agree completely. There are many cases where text is written in outside sources and licensed freely with the understanding that it can imported into Wikipedia. There are even cases (optics for example) where the author has asked us to import the content, but could not do so themselves. This artificial distinction between Wikipedia editors and all other authors in the world is bogus. What matters is attribution, not that all text is original to Wikipedia. Potentially, all authors are Wikipedia authors (given 95 years). The lead needs to be seriously revised. Kaldari (talk) 15:44, 19 June 2009 (UTC)
Um, it's a bit of a surprise to notice this in passing. I mentored the individual who was working on the optics article offsite. Arranging compliance was a big problem--especially because several people attempted to circumvent the licensing. That caused other problems that actually delayed the porting by nearly a month. It's very surprising to afterward discover--in passing--that being raised as an argument to weaken the plagiarism guideline. We could have saved considerable trouble and drama if this site had sufficiently robust guideline while that port was being orchestrated. DurovaCharge! 18:36, 19 June 2009 (UTC)
The way that import was handled not only violated this guideline, but also the terms of the GFDL (which don't allow attribution through URL). But regardless, that's not the point. The import could have been done in a way that complied with the GFDL, but there is no way that it could have been done in a way that didn't violate the current version of this guideline. It is but one of countless valid examples that this guideline doesn't take into consideration. Kaldari (talk) 18:50, 19 June 2009 (UTC)
Also, I don't understand your surprise. Chat from 2 days ago: "Me: For example, our recent optics migration violates the plagiarism guideline up and down. You: Go ahead and make the changes you think are needed, but please discuss them so people understand." Kaldari (talk) 18:53, 19 June 2009 (UTC)
Serves me right for being overcommitted; mea culpa. I thought you had something completely different in mind in that context. DurovaCharge! 23:29, 19 June 2009 (UTC)

minimal edit

Since we are talking about the lede section, the text just needs to be a faithful summary of the content lower on the page. I split the troublesome sentence into two, and made it refer directly to the full discussion lower down. — Carl (CBM · talk) 13:37, 19 June 2009 (UTC)

Completely wrong

In my opinion, the proposed prohibition against direct copying is completely inappropriate. Plagiarism is use of material without citing the source, period. If you cite the source, then copying is not plagiarism.

In a academic setting, Things are very different. When a student submits a paper, There is a presumption that the wording itself, and not just facts and ideas, are the work of the student whose name appears on the paper as the author. In general, the entire object of the paper is to demonstrate to the teacher that the student can do original work: This is the presumptive meaning of the student's name on the paper.

The situation at Wikipedia bears no relationship whatsoever to the student paper. There is no author's name on the Wikipedia article, and the reader has no a priori expectation of authorship. Our standards are explicitly at variance with those of a student paper: we consider original research, synthesis, and even creativity to be detrimental.

A casual Wikipedia reader is looking for information and has no expectation of authorship at all. We serve this reader best by providing the best possible encyclopedic content we can. A more serious reader will look deeper, and will want to know more about the sources, but a serious reader will almost immediately become aware that Wikipedia is a collaborative work and that any wording may have come from any of a huge collection of editors or other sources. An even more serious reader will quickly learn how to find out exactly where any wording in the article came from. The goal of our plagiarism policy should be to ensure that someone who wants to know where the wording came from will be able to do so.

So much for our obligation to the reader. But what about our obligation to the original author? This comes in two parts: legal obligation, and moral obligation. Our legal obligations are embodied in copyright law and are beyond the scope of the plagiarism guideline. Our moral obligation to the author is to provide acknowledgment and recognition. But we have a moral obligation to the reader, to provide an easily readable article, and the original author has a moral obligation to us, as members of civilization. We must balance these obligations. For me, I believe that we have no more (or less) moral obligation to an author than we have to any Wikipedia editor, but Since that author is known, we can an should cite the original in the edit summary.

In many cases, (e.g., the DNB) useful encyclopedic articles in the public domain were written by authors as work for hire, and were at most minimally attributed in the originals. For such sources, in my opinion we have no moral obligation to the publisher whatsoever: the publisher is a soulless legal entity, not an individual. The individual author already conveyed the copyright to the publisher, and I feel no particular obligation to the author, either. In (most) other cases the author of a work which is no longer in copyright has no expectation of further ownership of the work in the moral sense. We, as that author's cultural heirs, have a right to use the work as we see fit. This was the norm for more than a thousand years, and only changed when the printing press suddenly added monetary value to the ability to control the copying of a work. That is, it's all about money, not "moral rights" at all.

Please, do not artificially restrict our ability to use our cultural heritage based on some mistaken analogy with grading student papers. -Arch dude (talk) 19:53, 19 June 2009 (UTC)

I've seen this analogy before – this comparison to student essays – and I'm afraid that it doesn't quite fit. (As an aside, I find it rather unfortunate that some students were only exposed to proper attribution as a means of ensuring they did their homework.)
A far closer style of writing to our work here would be the literature review or review article: a (nominally) unbiased, (often) peer-reviewed summary of the current knowledge on a topic, thoroughly supported by references to the highest-quality works in the field. A literature review doesn't (typically) attempt to advance or synthesize new ideas; the goal is to clearly document extant knowledge within a concise framework.
Despite those constraints, it would be foolhardy to assert that a good review article lacks creativity or originality. From a big-picture perspective, a great deal of editorial judgement goes into deciding the scope and organization of a good review paper. At a lower level, the choice of appropriate references (as well as rejection of inappropriate ones), the weighting of subtopics, and the nuts and bolts of putting together words are all creative acts of authorship. I trust that the similarities (both in terms of objectives and in terms of the challenges and decisions faced by the authors) between reviews and encyclopedia articles are clear. While we do not generate new statements of fact, I think some of our best writers would be greatly offended to hear that we do not do anything creative.
I fear that your argument is overlooking a part of the moral obligation we are under. Engaging in plagiarism is seen by many as a violation not of law, but of a social contract. The clear and concise expression of ideas is inherently valuable, above and beyond the intrinsic worth of the knowledge contained by those representations. Consider the subject matter experts – the historians, the scientists, the academics – who we so desperately want to participate in and endorse this project's works. The coin of their realm is the expression of ideas. Lifting their words without proper acknowledgement of the sweat of their brows makes us look sloppy at best, and like lazy teenagers at worst. The professor whose writing we quote will be predisposed to appreciate, endorse, and aid our efforts. The academic whose words we plagiarize without appropriate attribution will think us incompetent or unethical. Proper citation of sources – including clear identification of verbatim copying – is the academic world's equivalent of the GFDL. A failure to respect those firmly-held principles is apt to turn away our most valuable potential contributors.
If good writing were easy, we wouldn't need or want to copy anything. If good writing is difficult – and it is – then we ought to give credit where it is due. TenOfAllTrades(talk) 02:16, 20 June 2009 (UTC)
I seriously doubt that any work of any living academic is currently out of copyright, and I also seriously doubt that any living academic will object to our use of properly-attributed work in any form. In general, there is no particular difference between "sufficient paraphrasing to avoid copyright infiringment" and direct copying, as far as plagiarism is concerned. in either case, you are still using the original author's concepts and ideas, and the ideas, rather than the words, constitute the critical contribution that must be attributed. My problem here is with the insistance on quote marks as the only legitimate method of attribution of a direct copy. I feel strongly that this will severely inhibit the evolution of the content. We should only quote when the form of the words themselves is at issue. We need to attribute whether or not we paraphrase, and the standard for attribution is the same whether or not we paraphrase. By using a different standard for direct copy than our standard for paraphrasing, we are implicitly encouraging editors to believe that the form of the words is somehow more worthy than the ideas behind them. While this is true for copyright, it is not true for moral rights. We must provide the same level of attribution in either case. -Arch dude (talk) 10:51, 20 June 2009 (UTC)
Wow. Can we just substitute that screed for the guideline itself? What a great mission statement!
There is absolutely no "prohibition against direct copying" envisioned here (doncram's view may differ, but has gained no traction, with all due respect to dc). Direct copying is perfectly acceptable, where it is legal and appropriate, and when it is properly attributed. Franamax (talk) 04:20, 20 June 2009 (UTC)
I don't understand what Franamax is suggesting my position is. I am for direct copying into wikipedia with proper attribution where material is public domain or otherwise not in copyright violation, and where the material is encyclopedic and appropriate. A footnote at the end of a sentence or paragraph usually provides proper attribution for ideas, but does not suffice to identify verbatim passages or to give credit for wording. It is usual practice in academia, in newspapers, and in any but unscrupulous forums, for credit for wording to be indicated by use of quotation marks. In political speeches and other formal verbal settings, written quote marks are replaced by spoken words "in the memorable words of Winston Churchill, ___" or "quote blah blah unquote" or by hand gestures understood to suggest quote marks. Franamax, is that what you suggest has gained no traction, that original wording by other authors should routinely be given credit in wikipedia? I don't believe you are for a wikipedia policy that others' wording shall not be given credit in wikipedia. I think you think that an edit summary in the past history of an article, which may be one of thousands of edit summaries, and which may be contradicted by other edit summaries, suffices to give adequate attribution for wording. I do disagree with that. doncram (talk) 15:47, 20 June 2009 (UTC)
I wrote my little tirade in direct response to the wording of the lede. If we add your sentence, "Direct copying is perfectly acceptable, where it is legal and appropriate, and when it is properly attributed" t the lede, I will be happy. -Arch dude (talk) 10:51, 20 June 2009 (UTC)
The lede has changed some in the last 24h. Which part of the current wording (permalink) do you not like? — Carl (CBM · talk) 13:07, 20 June 2009 (UTC)
It's quite simple. The lede and the text currently prohibit me from making a direct copy of a entry of an entire article from the public domain 1900 version of the Dictionary of National Biography, except by placing the entire article in quotes. This is silly. The DNB articles are already encyclopedic and in general form a good basis for a Wikipedia article, but they need a lot of work. The mechanics laid out in this guideline for editing quoted material are extremely awkward and are in conflict with he current guidelines related to qutoed material, so many editors will simply not edit quoted material at all. In the specific case of the DNB authors, This material was originally published with very sketchy attribution and with the clear understanding that the articles could and would be edited later. If you look at the modern ODNB, you will find that the original DNB articles have been edited extensively, with no attempt to retain attribution on a word-by-word or paragraph-by-paragraph basis. Wikipedia has no more obligation to the original authors than does the Oxford University Press that publishes the ODNB. We have the same legal and moral rights to the origonal material that Oxford university press has. In this specific case and in other similar cases, I feel very strongly that we should be able to create an article by starting from an existing public-domain encyclopedic article, copied word-for-word and properly attributed with a single attribution at the end. In the case of the seventeen DNB articles I have created, I always start by creating the exact article at Wikisource and then copying it in unmodified form to Wikipedia to form a documented basis. I add a template and a link to the Wikisource article. There are 50,000 DNB articles, so we have a way to go yet. The DNB is only one of about twenty encylopedic references that we can and should be using in this fashion. In most cases, these encyclopedias have successors that are still in print and that are copyrighted, and in no case do those successors attempt to retain attribution of wording back to the originals. This is furher evidence that attribution of wording is not the normal practice for updates of encyclopedias. Wikipedia is an encyclopedia. I think we need to distinguish the appropriate protocol for copying material that is already in an encyclopedic form, for material that come from original sources. I also think that the two types of material need to be given equal weight: We are not "making an exception" for eycyclopedic material. We instead simply have a separate protocol for this material. Please note: There are legitimate objections to using old encyclopedic material as a basis for an article, but plagiarism is not one of them. -Arch dude (talk) 04:17, 21 June 2009 (UTC)
I do not believe that the present text claims you are prevented from incorporating DNB articles. However, I agree with the general principle that the page here should not claim that incorporating free text is forbidden, or is plagiarism, since we do permit it. — Carl (CBM · talk) 13:01, 22 June 2009 (UTC)
It's plagiarism unless we attibute it. We do permit it if it's attributed. The problem is with the exact nature of the mechanics of the attribution. At the time I wrote my tirade, the article stated (or very strongly implied) that exact copying was only properly attribute if quote marks were used, and this is what I object to. The lede has since been completely change and is no longer string enough: The lead should say: "Plagarism is the incorporation of material from another source without attribution. Wikipedia does not permit plagiarism. Wikipedia's standards for attribution are described in this guideline." The body of the guideline should describe the approprite mechanics for attribution for various types of inclusion of material. -Arch dude (talk) 14:52, 22 June 2009 (UTC)

<-- As can be seen in the history, I rewrote the introduction, with a help from Moonriddengirl and I hope it reflects what the majority of editors consider acceptable. However as with all edit I expect it will be edited unmercifully. From what you wrote above Arch dude I think you are basically happy with the alteration but you are not comfortable with the paragraph that starts "Some external works that are copyright expired..." because it does not say that an editor MUST include attribution. I think that the way to deal with this is rather than adding it to the lead, (as it does not have to be done for legal reasons) it should be described in a section called "copyright expired" that if text is copied from a copyright expired source and no attribution has been added, rather than delete the text editors are encouraged to add attribution. That brings me to the next point, I think that the subsections "Attributing text copied from other sources" should be rewritten to reflect the different types of sources that I have touched upon in the introduction. --PBS (talk) 10:34, 23 June 2009 (UTC)

Attributing text copied from other sources

Am I the only person that find that this section is ludicrous? Allow me to quote the lead (attribution at the project page history section):

Wikipedia draws clear distinctions between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly"), work marked as a quotation (which must be properly credited and left essentially untouched), work described as a paraphrase of another source (which can be edited as long as the original sense is not lost), and direct copying of large blocks of free content written by other people (which should also be credited). In quotations, editorial notes and minor changes are sometimes useful, but must be clearly marked as such. See WP:MOSQUOTE for details.

The section then goes on to to discuss, erm, none of the above cases. What follows instead is a discussion of the copyright status of various sources, and not a very good one at that.

Wikipedia policy requires the citation of sources. It also allows the restricted use of direct quotations under the Non-free content criteria, one of which is that the source is correctly attributed (another is that the source has been previously published outside of Wikipedia). Now this use of properly attributed textual quotations is not controversial: allow me to quote article 10 of the Berne Convention:

  1. It shall be permissible to make quotations from a work which has already been lawfully made available to the public, provided that their making is compatible with fair practice, and their extent does not exceed that justified by the purpose, including quotations from newspaper articles and periodicals in the form of press summaries.
  2. It shall be a matter for legislation in the countries of the Union, and for special agreements existing or to be concluded between them, to permit the utilization, to the extent justified by the purpose, of literary or artistic works by way of illustration in publications, broadcasts or sound or visual recordings for teaching, provided such utilization is compatible with fair practice.
  3. Where use is made of works in accordance with the preceding paragraphs of this Article, mention shall be made of the source, and of the name of the author if it appears thereon.

U.S. fair use law is very similar with regards to text, although many of the key cases date from before the accession of the U.S.A. to the Berne Convention.

I am well known for saying that this whole exercise is a waste of everybody's time: it is more so if you don't look at the copyright restrictions (and hence the existing WP policies) which also cover this area. Physchim62 (talk) 13:32, 22 June 2009 (UTC)

I mostly agree with you that almost all of this guideline is redundant because it duplicates eitherour copyright guideline or our reliable source guideline, in the sense that if you adhere to those guidelines, you wil almost cartainly avoid plagiarism. But not quite. There is a small area where I might plagiarize while still adhereing to WP:COPY and WP:RS, and we need to close the loophole. For example, I might copy a a survey article from a journal, with enough paraphrasing to avoid copyright infringement, and then back up every fact in the article by using the references that were used in the survey article, without ever referencing the survey article itself. Thi is not illegal, but it is wrong, and we should not do this. All we need to do to make it right is to specifically reference the survey article. In the case of a PD source, we do not even need to paraphrase, and in fact we are more honest if we start from an exact copy. -Arch dude (talk) 15:06, 22 June 2009 (UTC)

Section break (other sources)

I suggest that this section is restructured so we have sections on the different types of text that can be copied into an article. It is based on the structure in the new introduction and obviously there is a lot of text in the current "Attributing text copied from other sources" which should also be included, but it seperates out the different way that text is used

suggested structure:

Attributing text copied from other sources

A mention here that if an editor wishes to incorporate text from another source and is not sure which category the text fall into then they should ask on the talk page of the Wikipedia article, or ask at Wikipedia:Reliable sources/Noticeboard before copying any text into a Wikipedia article. As it is better to sort this out before a mess is created rather than afterwards.

Sources under copyright
Sources under copyleft
Public domain sources

This is more complicated than copyright expired because I think there must be a clear statement of how the text happens to be in the public domain, and if there is a doubt then any text's status then it should be treated as under copyright.

Copyright expired sources
Compliance with the content policies

This might be better incorporated into the fist section after the when in doubt paragraph.

--PBS (talk) 11:02, 23 June 2009 (UTC)

I have had a go at making the changes I suggested above ([5]. It is rough and ready, an I hope that you will all join in and edit it mercilessly, to fettle it. --PBS (talk) 22:08, 25 June 2009 (UTC)

Dubious discussion

In this edit, the statement that "When copying material within Wikipedia, from one article to another, attribution is also required" has been marked with the {{dubious}} template, asking for discussion. Here is the discussion:

Material contributed to en:wiki is done under the GFDL and CC-BY-SA licenses. Both unambiguously require that attribution to the original author(s) is mandatory. Is the dubious part about the case where someone copies only a quotation or external free text here and there? In that case, attribution is still required, but to the original author(s), not the wiki editor who copied it in. In any other case, GFDL requires a History page to indicate the Authors of the new Document. If the contributing editor is copying from another en:wiki page, they are not the Author of that text, rather they are incorporating the work of previous Authors, whose contribution must be acknowledged. Franamax (talk) 01:17, 26 June 2009 (UTC)

It is not that simple, in reality text is often copied from one point to another, often with no attribution. For example when the result of an AfD is merge the text worth saving and delete the page. There are other technical examples (eg when pages are moved over redirects which have an edit history administrators make judgments about whether to delete an edit history all the time at WP:RM). Further most editors would not consider a comment in the history of an article to be described as attribution. For example would you consider a note in the edit history of an article to be adequate attribution for source from any source but Wikipedia? The point is that what we are talking about here is a guideline for plagiarism, not the requirements for the GFDL and CC-BY-SA licenses. But even under those, I would be interested to see where there is a copyright requirement within English Wikipedia for text created within English Wikipedia.
As I understand it it is needed for an audit trail to prove copyright in the case of an outside entity disputes the authorship, not for copyright within Wikipedia. For example suppose that a commercial organisation takes a copy of some text from within Wikipedia and claims that they originally wrote in and demand damages from Wikipedia for breach of copyright, the edit history of the article would be a very strong protection against such a ploy and why whenever possible when text is merged, a comment in the history will help with tracing the audit trail. But the motivation for those comments is not plagiarism, which is what we are discussing in this guideline.
The problem one run into with the argument that any attribution can be placed in the edit history, is there is no requirement for a third party to pass on the edit history with the text so AFAICT there is no form of permanent attribution possible within the edit history. --PBS (talk) 09:51, 26 June 2009 (UTC)
I think you have summed it up well PBS, this was my understanding as well. At least in the AfD merge case, most of the time the redirect is left in place which does leave a loosely connected history to trace back. In the rare cases where we WP:MAD we lose that though. Gigs (talk) 13:10, 26 June 2009 (UTC)
If "text is often copied from one point to another, often with no attribution", then that is a breach of the GFDL and CC-BY-SA licenses, and should be discouraged. It seems like a good argument for giving editors guidance, as the disputed phrase here does. Failure to attribute creates plagiarism, not copyvio, and as such this is the relevant place to put that guidance. As far as possible, we should always try to ensure that some mechanism of attribution is in place, even in AfD merge. Wikipedia demands an audit trail to satisfy attribution, and although it is also useful in dealing with copyright issues, that is a red herring here. Third parties are required by our licence to attribute: that they may fail to do so, is not a reason for us to do so as well. Finally, although neither Philip nor I can realistically claim to speak for most editors, I certainly do consider an appropriate comment in an edit history as attribution. YMMV. --RexxS (talk) 19:13, 26 June 2009 (UTC)
Please explain why it is a breach of the GFDL and CC-BY-SA licenses (clause etc) --PBS (talk) 20:04, 26 June 2009 (UTC)
The GFDL states:
0. PREAMBLE ... Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
4. MODIFICATIONS ... You may copy and distribute a Modified Version of the Document ... In addition, you must do these things in the Modified Version: ... B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five) ... I. Preserve the section Entitled "History" ... and add to it an item stating at least the ... new authors ... of the Modified Version
5. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions ... In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"
9. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.
So, it would appear from (0) that the intent of the licence is to preserve credit for each author; from (4) that modifications have to preserve a list of previous authors; from (5) that merging two pages (both licensed under GFDL) has the same requirement to preserve a list of authors from both; and from (9) failure to do those things terminates the licence. I won't clog this page with the the same from CC-by-SA, but I hope that helps you. --RexxS (talk) 20:40, 26 June 2009 (UTC)
I am not a contract lawyer, but do not think that your reading of the GFDL, is correct, as I do not think you are clearly distinguishing between a work created with "Invariant Sections" and those like Wikipedia articles which are not: "For compatibility reasons, you are also required to license it under the GNU Free Documentation License (unversioned, with no invariant sections, front-cover texts, or back-cover texts)."(From the link "Terms of Use" at the bottom of a Wikipedia page).
However I do think that the Terms of Use as contained in the like at the bottom of a Wikipedia page under the link "Terms of Use" should be considered for the impact that they have on the information presented on this page. --PBS (talk) 13:30, 27 June 2009 (UTC)
I may be misinterpreting the conversation here, as I am somewhat jet-lagged, but it seems to be suggesting that attribution is not required from a copyright standpoint? Wikipedia does not own the copyright to content that is created here; contributors do. They do not relinquish their copyright when they contribute here, but license their text for reuse. If the conditions of that license are violated, then reuse becomes a copyright infringement unless the contributor(s) waives that right. As to attribution requirements of that license, currently, the bottom of every edit screen sets forth the minimal attribution required: "You agree to be credited, at minimum through a hyperlink or URL, when your contributions are reused in any form." This is in keeping, obviously, with the Wikimedia:Terms of Use: "all users contributing to Wikimedia projects are required to grant broad permissions to the general public to re-distribute and re-use their contributions freely, as long as the use is attributed...", "As an author, you agree to be attributed in any of the following fashions...[omitted]".
Invariant sections (and our lack thereof) do not impact 4(I) of GFDL, which sets forth what is necessary to distribute modified versions of a Document. To quote GFDL, "'Modified Version' of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language." Invariant sections are "Secondary Sections" and by definition must be unrelated to the subject of the Document. (I am capitalizing Document here as I am using the term specifically in keeping with the definition set out in GFDL: "This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The 'Document', below, refers to any such manual or work.") A Wikipedia article is a Document. Even if it contains "no invariant sections", modified versions require attribution. --Moonriddengirl (talk) 14:00, 27 June 2009 (UTC)

(unindent)I don't think there is any real controversy, it has always been common practice to assume that not preserving the history in one way or another does violate the GFDL. It does get fuzzy because wikipedia never followed the GFDL to the letter, choosing to ignore some sections. But it never ignored the attribution requirements. Note that all work here was able to be dual licensed, even old work, because Wikipedia is such a large user of GFDL that they convinced FSF to designate the CC-SA license as the new version of the GFDL, in a tricky move that was somewhat controversial. Gigs (talk) 01:45, 30 June 2009 (UTC)

I clarified the sentence in question. Regardless of any arguments about what the licenses actually say, there is a long-established practice here that we do not require in-article attribution when we copy text from one Wikipedia article to another, just an edit summary (see WP:MERGE). The wisdom of that policy can be debated, but it is not an issue that can be resolved on this page. I'd suggest the village pump. — Carl (CBM · talk) 03:00, 30 June 2009 (UTC)

To Gigs - yes, the wiki-way does smear GFDL things around a bit. I've read the GFDL at least a dozen times now and it's still pretty gnarly. It helps if you take the view that the History section is preserved in the tab where you click "history" and the new Title section is also presented when you click that same tab. If you also adopt the view that the Title of the Modified Version is actually a number (the Title of the Document from which I am creating a Modified Version right now is "294437310"), then actually Wikipedia is pretty much perfectly compliant. And as you say, attribution one way or another has always been fundamental.
However, the requirements of GFDL are also widely ignored, in my view mostly because it's a pretty thick bunch of words to get through. PBS notes above that admins may routinely ignore the requirements in AFD merges, and I myself can name at least two examples (an interwiki translation I requested and a summary article creation via copy-paste that I commented on) where the attribution requirement was not observed. Note however that this is an area where consensus or common practice can not modify the rules, GFDL is an underlying tenet of the project and simply must be observed. At the same time, we're not necessarily going to start deleting content, but we need to fix and educate on an ongoing basis.
CBM, while I'm not opposed to your clarification just now, I do have a concern. You have now made a specific statement which will inevitably be taken as the one and only rule by those less inclined to extensive reading. I believe that there are templates available to indicate at the top of the article talk page that text has been copied/translated from other free sources, though I can't lay my finger on them right now. Shouldn't these deserve some mention? The idealistic me would prefer to lay down a standard going forward for attribution beyond just in the edit summary, though I may be alone with that desire.
Could you modify the text further, with "at a minimum, attribution within the edit summary ... and <xxx> templates should be placed on the article talk page to indicate blah blah"? Franamax (talk) 05:04, 30 June 2009 (UTC)
I simply looked at WP:MERGE, which does not say anything about templates on the talk page. Are there actually templates intended for use when text is moved from one wikipedia article to another, rather than from an external source to wikipedia? — Carl (CBM · talk) 15:57, 30 June 2009 (UTC)
Not Franamax, but there are indeed. :) You can see them listed at Category:Merge templates. (There are also split templates at Category:Split maintenance templates. For one specific example, see Template:Merged-from. These aren't currently mandatory. Personally, I'm not sure they need to be, although I support their use at articles that have been turned into redirects or otherwise might be prone to deletion, since these are needed for attribution history. --Moonriddengirl (talk) 16:09, 30 June 2009 (UTC)
"History section is preserved" I don't think that the tab at the top of the page qualifies as a entitled section. Section means a section in the text, and would be part of the document if copied. I think that the problem we run into here is that AFAICT non of us are contract lawyers, and this issue is far to complicated for us to judge. Besides I think a far more relevant document is Terms of Use, and that explains that attribution must be given in such circumstances. I have put the relevant sentence in bold and we may as well stick close to that wording:
Importing text:

If you want to import text that you have found elsewhere or that you have co-authored with others, you can only do so if it is available under terms that are compatible with the CC-BY-SA license. You do not need to ensure or guarantee that the imported text is available under the GNU Free Documentation License. Furthermore, please note that you cannot import information which is available only under the GFDL. In other words, you may only import text that is (a) single-licensed under terms compatible with the CC-BY-SA license or (b) dual-licensed with the GFDL and another license with terms compatible with the CC-BY-SA license

If you import text under a compatible license which requires attribution, you must, in a reasonable fashion, credit the author(s). Where such credit is commonly given through page histories (such as Wikimedia-internal copying), it is sufficient to give attribution in the edit summary, which is recorded in the page history, when importing the text. Regardless of the license, the text you import may be rejected if the required attribution is deemed too intrusive.

--PBS (talk) 08:50, 30 June 2009 (UTC)

PBS, I'm not sure it matters to the current conversation, but this has nothing to do with contract law. A pure copyright license is not a contract. It is different in key ways such as not requiring a mutual exchange of consideration (it is binding even if a product is freely provided, while a contract is generally not). The FSF carefully designed their licenses to not rely on any clause of contract law, keeping it as a pure copyright license. This also limits the remedies to those available for copyright infringement. If you violate the GFDL or GPL, you have not breeched a contract, you have committed copyright infringement. It's a subtle but important distinction. Gigs (talk) 13:56, 2 July 2009 (UTC)

The lead has gotten too long

Granted, today is a hot day here, but even on a cooler day I think I would struggle to get all the way to the end of this lead. It is too dense, presents too much detail too soon. JN466 16:36, 30 June 2009 (UTC)

It's supposed to summarize the document, but I agree that it was a bit thick. Is it better now? --Moonriddengirl (talk) 17:05, 30 June 2009 (UTC)
Yes, much better. Thanks. JN466 17:07, 30 June 2009 (UTC)
That's an easy one, then, until somebody else doesn't like it. :) --Moonriddengirl (talk) 17:58, 30 June 2009 (UTC)

Is attributed copy and paste plagiarism?

This is not clear - when editor copies & pastes large chunks of text but attributes them (with inline ref or such) are we dealing with plagiarism or copyvio (or both)? Which policy (policies) should be cited in a warning? --Piotr Konieczny aka Prokonsul Piotrus| talk 06:44, 14 July 2009 (UTC)

That is (potential) copyright infringement. Cite WP:COPYVIO. Please propose improved wording for the policy pages to make this more clear. --Hroðulf (or Hrothulf) (Talk) 13:30, 14 July 2009 (UTC)
It would depend on the copyright of the work (is it public domain?) whether there is a copyvio. In academic settings, it is often considered inappropriate to directly copy text even with attribution, but on Wikipedia we have no objection to incorporating freely-licensed text directly into articles along with attribution. Nonfree, copyrighted text needs to be quoted or substantially rephrased, as well as attributed. — Carl (CBM · talk) 14:28, 14 July 2009 (UTC)
Let's not forget that we're probably also dealing with bad article style in such a case, so cleanup tags on the article may well be appropriate. I agree that this sort of situation leads to a big risk of copyvio, and so the user needs to be warned about that. Physchim62 (talk) 15:02, 14 July 2009 (UTC)
Attributed copy and paste is junk writing no matter what the plagiarism and copyright status is, and we should never be doing it. It's lazy and it's bound to cause problems with lots of WP content and style guidelines. My point is, who cares if it's plagiarism or not—we just shouldn't be doing it. rʨanaɢ talk/contribs 18:53, 18 July 2009 (UTC)
This may or may not be true, but it is not relevant to this guideline. A lot of editors duisagree with you. A straight copy-and-paste of a 5,000-word article from the 1900 DNB is a great deal better than no article at all, and it serves as an excellent basis for further improvement. We can argue about this in some other venue, not here. This guideline is intended to tell an editor how to avoid plagiarism. It is not intended to tell the editor how to write a good article. -Arch dude (talk) 20:03, 18 July 2009 (UTC)
Leaving all the points aside about good article writing and copyvios though - Is it or is it not acceptable for me to copy-and-paste any arbitrary amount of free text into an article page where the only indication that I've made a direct copy is a (perfectly formatted) <ref> tag which shows in the footnotes section as a simple reference? It's just a plain 'ol footnote to an outside source, not a note that says "copied from...", just a normal {{cite}} template, and I've nowhere else indicated that I'm making a verbatim copy of someone else's writing. Is that OK or not? Franamax (talk) 01:46, 19 July 2009 (UTC)
This is precisely the kind of question that this guidline should answer. In my opinion, a direct massive copy like this deserves soemthing more prominent than a single "ref"-style tag. In the extreme cases that I am most interested in, the entire article (words, structure, tone, ant all) starts as a direct copy, and I think we need to state this. A ref tag implies two things that are not valid in this context: first, the scope of a ref tag is (implicitly) the sentence or at most the section to which the ref tag is attached, and second the ref tag (again implicitly) says that we took (just) facts from the reference. To avoid plagiarism and give credit where it is due, we need more, and a statement similar to the one generated by Template:1911 is appropriate. I feel that if the section is sufficiently short to fall within the (implicit) scope of a ref tag, (a paragraph or a short section), then use a ref tag, but use the phrase "contains text from 'ref,' a work in the public domain" within your ref. -Arch dude (talk) 07:57, 19 July 2009 (UTC)
I agree that an attribution template, rather than just a footnote, is desirable in the situation Franamax is describing. — Carl (CBM · talk) 15:10, 19 July 2009 (UTC)