Wikipedia talk:Large language models/Archive 6
This is an archive of past discussions on Wikipedia:Large language models. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 4 | Archive 5 | Archive 6 | Archive 7 |
Secondary sources
Should we cite any of the factual claims about LLM's in this page to reliable sources? –LaundryPizza03 (dc̄) 23:42, 24 June 2023 (UTC)
LLMs and UNDUE
Is it worth mentioning that LLMs also have trouble with WP:DUE? When asked to summarise content, they may produce a summary which places too much weight on certain details, or tries to summarise parts of the article which would normally be left out of the lead. I've been involved in some cleanup of LLM-generated lead sections, which often seem to be overly interested in a subjects relationship history or personal affairs, and less so in their career (for which they were actually notable). Mako001 (C) (T) 🇺🇦 00:14, 23 July 2023 (UTC)
- Yes, because this is afaik even bigger issue than inventing stuff from thin air as this is even more complex to detect. -- Zache (talk) 03:05, 23 July 2023 (UTC)
Non-compliant LLM text: remove or tag?
This draft currently recommends tagging non-compliant contributions with {{AI generated}}
. It should recommend removal instead.
The tagging recommendation is incoherent with the user warn templates like {{uw-ai1}}
. If LLM text were worth keeping, then why would we warn people who add it? There's no point in trying to fix non-compliant LLM text, which will either have no sources or likely-made-up sources. It's better to remove. Do LLMs write more accurate text than the deleted WP:DCGAR? I doubt it.
Let's try to keep this discussion organized: should the draft recommend removal, or tagging? Note that we're only talking about deleting raw LLM outputs added to existing articles. For deleting fully LLM-generated articles through WP:CSD, there's a current discussion elsewhere.
DFlhb (talk) 11:50, 3 June 2023 (UTC)
- Friendly ping for editors who participated in a previous discussion on this: Novem Linguae, Barkeep49, Thryduulf. DFlhb (talk) 12:49, 3 June 2023 (UTC)
- My opinion has not changed since the previous discussions - whether text is AI-generated or not is irrelevant (and impossible to determine reliably even if it was relevant). In all cases you should do with AI-generated content exactly what you would do with identical content generated by a human - i.e. ideally fix it. If you can't fix it then tag it with what needs fixing. If it can't be fixed then nominate it for deletion using the same process you would use if it was human-generated. Thryduulf (talk) 13:07, 3 June 2023 (UTC)
- Thanks, this logic is compelling. If we can't tell them apart then our response must be the same. In that case, do we need
{{AI generated}}
, or would the existing templates suffice? DFlhb (talk) 13:21, 3 June 2023 (UTC)- I don't see any need for that template - it's speculation that even if correct doesn't contain anything useful that existing templates do not (and in at least most cases they also do it better). Also, even if we could reliably tell human and AI-generated content apart, our response should be identical in both cases anyway because the goal is always to produce well-written, verifiable encyclopaedic content. Thryduulf (talk) 13:36, 3 June 2023 (UTC)
- Thanks, this logic is compelling. If we can't tell them apart then our response must be the same. In that case, do we need
- I disagree that experienced editors can't figure out what is AI-generated and what is not. According to this template's transclusion count, it is used 108 times, which is good evidence that there are at least some folks who feel confident enough to spot AI-generated content. I definitely think that the wording of this template should recommend deletion rather than fixing. AI-generated content tends to be fluent-sounding but factually incorrect, sometimes complete with fake references. It reminds me a lot of a WP:CCI in terms of the level of effort to create the content (low) versus the level of effort to clean it up (high). Because of this ratio, I consider AI-generated content to be quite pernicious. –Novem Linguae (talk) 14:06, 3 June 2023 (UTC)
- In other words, it's a guess and the actual problem is not that it is written by an AI but that it tends to be factually incorrect. Why should AI-written content that is factually incorrect be treated differently to human-written content that is factually incorrect? Why does it matter which it is? Thryduulf (talk) 14:16, 3 June 2023 (UTC)
- Because given it's structure it would take 5 times as much work to try to use the material (in the context of a Wikipedia article) than it would be to delete & replace it. North8000 (talk) 14:15, 3 June 2023 (UTC)
- A good analogy is badly written, unstructured and undocumented software. 10 times less hours to nuke and replace than to reverse engineer and rebuild a herd of cats. North8000 (talk) 14:21, 3 June 2023 (UTC)
- I'm not arguing that material in that state shouldn't be deleted, I'm arguing that whether it's in that state because it was written by AI or whether it is in that state because it was written by a human is irrelevant. Thryduulf (talk) 19:02, 3 June 2023 (UTC)
- That's true in theory, but a big difference is that human-written content is usually presumed to be salvageable as long as the topic is notable while AI should be treated more like something written by a known hoaxer. –dlthewave ☎ 20:15, 3 June 2023 (UTC)
- IMO the context of how it was generated is important in trying to figure out how to deal with it. For example, if you (Thryduulf) wrote "the sky is green" It would probably be worth the time to find out what you intended.....e.g. maybe in certain contexts / times. Or (knowing that there must have been some reason) take to see if there are instances when the sky is actually is green and build upon what they wrote. If the "sky is green" was built by a random word generator or typed by a chimpanzee, it would be silly to waste my time on such an effort. North8000 (talk) 21:56, 3 June 2023 (UTC)
- @Dlthewave and @North8000 These require you to know whether text was generated by a human or by a LLM. There is no reliable way to do this, so what you are doing is considering the totality of the text and making a judgement about whether it is worth your (or someone else's) effort to spend time on it. The process and outcome are the same regardless of whether the content is human-written or machine-written, so it's irrelevant which it is. Thryduulf (talk) 22:02, 3 June 2023 (UTC)
- We're really talking about two different things. I was answering: "presuming that one knew, should it be treated differently?" You are asserting one premise that it is impossible to know, and then based on that saying that that if that premise is true, then the question that I answered is moot. Sincerely, North8000 (talk) 01:29, 4 June 2023 (UTC)
- I was explaining why the question is irrelevant - it isn't possible to know, so there is no point presuming. However, even if we were to somehow able to know, there is no reason to treat it differently because what matters is the quality (including writing style, verifiability, etc) not who wrote it. Thryduulf (talk) 09:28, 4 June 2023 (UTC)
- Why wouldn't it be possible to know? I've seen several users make suspect contributions, those users were asked if they used an LLM, and they admitted they did; in cases where they admit it, we know for sure, we're not presuming.
- I'm not convinced that users can never tell it's an LLM. There were several cases at ANI of users successfully detecting it, including a hilariously obvious instance from AfD. What I said below seems to work: if we identify it, delete; if we can't identify, by default we do what we normally do. DFlhb (talk) 10:53, 4 June 2023 (UTC)
- Plus whenever the discussion gets more detailed I think it will almost inevitably come out. For example, let's say that there is a phrase in there that makes no sense and you ask the person who put it in "what did you mean by that?" Are they going to make up lies that they wrote something stupid in order to cover for the bot? Or blame the bot? .North8000 (talk) 13:43, 5 June 2023 (UTC)
- I was explaining why the question is irrelevant - it isn't possible to know, so there is no point presuming. However, even if we were to somehow able to know, there is no reason to treat it differently because what matters is the quality (including writing style, verifiability, etc) not who wrote it. Thryduulf (talk) 09:28, 4 June 2023 (UTC)
- We're really talking about two different things. I was answering: "presuming that one knew, should it be treated differently?" You are asserting one premise that it is impossible to know, and then based on that saying that that if that premise is true, then the question that I answered is moot. Sincerely, North8000 (talk) 01:29, 4 June 2023 (UTC)
- @Dlthewave and @North8000 These require you to know whether text was generated by a human or by a LLM. There is no reliable way to do this, so what you are doing is considering the totality of the text and making a judgement about whether it is worth your (or someone else's) effort to spend time on it. The process and outcome are the same regardless of whether the content is human-written or machine-written, so it's irrelevant which it is. Thryduulf (talk) 22:02, 3 June 2023 (UTC)
- I'm not arguing that material in that state shouldn't be deleted, I'm arguing that whether it's in that state because it was written by AI or whether it is in that state because it was written by a human is irrelevant. Thryduulf (talk) 19:02, 3 June 2023 (UTC)
- A good analogy is badly written, unstructured and undocumented software. 10 times less hours to nuke and replace than to reverse engineer and rebuild a herd of cats. North8000 (talk) 14:21, 3 June 2023 (UTC)
- Because given it's structure it would take 5 times as much work to try to use the material (in the context of a Wikipedia article) than it would be to delete & replace it. North8000 (talk) 14:15, 3 June 2023 (UTC)
- Note that every transclusion of that template is on drafts, not articles, and those drafts are just tagged so no one wastes time working on them (because presumably MfD/CSD would fail).
- I've just reviewed those drafts. They're all very blatantly promotional, and have all the hallmarks of LLM text: stilted writing,
"In conclusion..."
. There's no identification problem there, and indeed we should delete that stuff, pointless to fix. - When it's easy to identify, we should delete, since it's basically spam. When it's not easy to identify, people won't come to this draft/policy for advice anyway, and they'll just do what they normally do. So I guess it's fine for this draft to recommend deletion for
identif[ied] LLM-originated content that does not to comply with our core content policies
. DFlhb (talk) 15:31, 3 June 2023 (UTC)
- My thought is that although AI-generated content should generally be removed without question, it's not always black-and-white enough to delete on sight. Just as we have Template:Copyright violation and the more extreme Template:Copyvio, there are cases where it's unclear whether or to what extent AI was used. Maybe the editor wants to wait for a second opinion, circle back to deal with it later or keep the article tagged while it's going through AfD/CSD. –dlthewave ☎ 18:22, 3 June 2023 (UTC)
- Good point. In essence saying that it LLM content should be removed but saying that there should be normal careful processes when there is a question, which is/will be often. North8000 (talk) 15:07, 6 June 2023 (UTC)
- @Dlthewave: In my view, we should take as stern a stance on LLM as we do on SOCK. An inflexible, blanket "do not use an LLM" should be policy, because (1) LLMs have the potential to completely destroy the entire Wikipedia project by overwhelming it with massive volumes of WP:BOLLOCKS, (2) those who have to resort to LLMs most probably lack basic WP:COMPETENCE to build an encyclopedia, and (3) even good-faith LLM users would create thankless busywork for legitimate contributors who have to waste time and energy cleaning up after them (which we don't need any more of as it stands, look at the WP:AfC backlog!). If possible, as soon as a reliable LLM-detector is developed, it should be used similarly to CheckUser, and violators (who are confirmed beyond reasonable doubt) should be indef banned. 〜 Festucalex • talk 20:31, 6 July 2023 (UTC)
- Have any of you guys checked out jenni.ai. Supposedly it has a million users... though I don't believe that number. It has a nice feature where it will search academic literature matches. If we implement something similar perhaps we could let people plug in their own literature libraries... Talpedia 15:42, 24 July 2023 (UTC)
Zero-click AI-search concerns
Have you checked out perplexity.ai lately?
If you prompt it to "write 800+ words on" whatever "with multiple headings and bullet points", it creates an article on whatever, that looks a lot like a Wikipedia article.
Even general prompts (with just the subject name) return fairly detailed responses, usually with the information the user was looking for, often quoted from Wikipedia, greatly reducing the need to actually visit Wikipedia, or any other page for that matter.
My concern is that, as perplexity.ai and similar search engines gain in popularity and use, this may eventually have a noticeable reduction on organic (i.e., human) traffic to Wikipedia, which may in turn diminish the flow of new editors, causing Wikipedia to become more and more out of date.
Meanwhile, bot (crawler) traffic to Wikipedia may continue to increase, driving the total traffic figure upwards, thereby hiding an organic traffic reduction.
I'm interested in how this issue can be tracked.
So, my question is: "How can organic traffic on Wikipedia be measured?
I look forward to your replies. — The Transhumanist 10:43, 2 August 2023 (UTC)
P.S.: And it now has picture support.
Citogenesis concerns
One thing the proposal should probably mention is that most LLMs were trained on Wikipedia; this means that using them (even just running things past them for verification) risks WP:CITOGENESIS issues. -- Aquillion (talk) 15:36, 30 July 2023 (UTC)
- Absolutely. This is the thing that should be avoided. Kirill C1 (talk) 09:28, 2 August 2023 (UTC)
- And future LLMs and new editions of LLMs will be trained on Wikipedia, a Wikipedia that has in part been edited by LLMs, thus creating a feedback loop. That is, LLMs will be trained on their own edits, which could amplify errors and bias, and compound citogenesis. — The Transhumanist 10:43, 2 August 2023 (UTC)
- I had included this in the draft at some point but it was removed for whatever reason.—Alalch E. 22:31, 14 August 2023 (UTC)
perplexity.ai revisited
It's been awhile since we've gone over the capabilities of perplexity.ai.
It has been improving rapidly.
Pulling its responses mostly from the web pages in its search results, it can:
- Prevent most chatbot "hallucinations"
- Answer in natural language prose
- Write source code
- Write MediaWiki wiki text "in a code block" (when explicitly requested)
- Do calculations
- Conduct comparisons
- Answer in your preferred language
- Make lists
- Produce tables (rendered or not)
- Expand prose that you provide
- Summarize:
- Works
- Books
- Plays
- Movies
- Specific web pages
- News articles
- News coverage by a particular newspaper for a specific time frame
- Current events (such as the Ukraine War)
- Works
- Write from a particular viewpoint or style, as a:
- Blog entry
- News article
- Encyclopedia article
- Documentary script
- Commercial or ad
- Poem
- Resume
- Particular person, like "Albert Einstein" (though, sometimes it refuses, depending on how you word the prompt)
- Etc.
- Note: Each style/viewpoint request represents a different enough prompt that you get more detail on a topic the more styles/viewpoints you request via a sequence of prompts.
- And it now has picture support
In addition to processing search results in the above manners, it can access the general capabilities of its underlying LLMs and other tools, to process data that you provide it in your prompt. Such as:
- Convert a format
- Copy edit prose
- Translate a passage
- Etc.
As it answers in natural language, using it is more like reading web pages than conventional search engine results pages, and the better you get at using it, the longer you can go without dipping into an actual web page.
They recently got an influx of 25 million dollars, and have expanded their team, who are detailed on the site. This has accelerated the rate at which the tool is increasing in capability.
Some improvements just become available, without any announcements. For example, the maximum size of prompts and of its results have been going up. Results were limited to around 250 words. Then they went up to around 400. And are now over 800.
The answer buffer is actually larger than the maximum response it can present, so you can ask it to "tell me the rest", and it will continue where it left off.
After each response, potential follow-up questions are provided via multiple choice, sometimes about things that may not have occurred to you.
It also remembers previous prompts and has access to its previous responses (in the current session only), and therefore can discern the context of your subsequent prompts and follow your instructions on what to do with previous results, like redisplay them further refined per your instructions.
All in all, this is a general purpose web browsing and summarization tool, that is more powerful than conventional browsing and searching, because it zeroes in on exactly what you request it to and pulls that data from web pages for you to digest directly.
Essentially, it is a textual genie that obeys your commands and fulfills your wishes, as long as it can do so in the form of text and available images.
What it can do is limited mostly by your limitations in dreaming things up for it to do. It may surprise you.
On the darker side, sometimes you may inadvertantly prompt it to argue with you! When that happens, create a new session. ;)
I hope this explanation of the tool has helped.
Sincerely, — The Transhumanist 12:01, 2 August 2023 (UTC)
Radical proposition on ban of LLM
I propose an outright ban of Large language models. Are they reliable? No. Can they perform a simple task like summarising a plot correctly? No. I emphasise, correctly. How can we distinguish which users have good knowledge of them and can handle who can't? The rule will only work if it will be unambiguously prohibit use of such tools, which, let's be honest, are incompatible with Wikipedia rules. Kirill C1 (talk) 11:03, 25 July 2023 (UTC)
- @Кирилл С1: Although I want LLMs to be able to be used to improve the encyclopaedia, I too find myself often thinking that they would be better off prohibited. As has been pointed out by others, LLMs give the most plausible answer, not necessarily the most accurate. It's only when they fail to be plausible that the errors get detected. I still don't quite know if I'd support an outright LLM ban, since it would remain hard to enforce, but then again, the only other option is an evil bit-type solution. Mako001 (C) (T) 🇺🇦 07:07, 30 July 2023 (UTC)
- We already had various similar discussions in the past here. One problem with a general ban is that it excludes way too much. For example, some popular spellcheckers like Grammarly are based on LLMs. And autocompletion functions while typing can be based on LLMs. Autocompletion functions for single words are very common for mobile users when entering text. LLMs have many applications and these are just a few examples. I assume it was not your intention to make a general ban in this sense. LLM technology is also more and more implemented into regular word processing software, like Microsoft Office.
- Many of the formulations in our current draft reflect exactly this issue: it currently only bans certain types of uses, like
Do not publish content on Wikipedia obtained by asking LLMs to write original content or generate references
. This is probably the more fruitful approach. - As a side-note: this draft is a draft of a policy. Policies reflect a very wide consensus among editors and are not considered "radical", as the title of your post states. Phlsph7 (talk) 13:44, 30 July 2023 (UTC)
- This has been brought up several times and did not garner support. Also, how would "large" language model be defined? Large language models are not ok, but small ones are ok (seems backward)? Is OCR assisted with a language model acceptable? Are predictive text, word completion, autocomplete, grammar checking etc. permissible?
- Seems like an almost neo-luddite reaction to something that is just unavoidable. —DIYeditor (talk) 16:36, 30 July 2023 (UTC)
- @Кирилл С1: I'm trying to collect examples of poor summarization by LLMs, and have a different experience with them than yours. Can you please give me some examples of failed summaries you've encountered? Sandizer (talk) 16:42, 17 August 2023 (UTC)
I'm pretty strongly opposed to LLM use in most if not all cases, but an outright ban is a non-starter for several reasons. First, previous discussions have shown that it's unlikely to achieve consensus. Second, such a rule would be challeged outright every time an editor comes up with a Great Idea that they think can be accomplished with LLMs. If we ban all use by default but also have a process where specific uses can be approved, we'll be protected from misuse but folks will also have a path they can follow if they think they can use the technology productively. Perhaps we could start out allowing spelling/grammar checkers and the like. –dlthewave ☎ 15:21, 30 July 2023 (UTC)
Based on discussions seen on the village pumps and other locations, I think there is a possibility of reaching a consensus on disallowing the use of programs to generate text, including text generated based on human prompts, thus disallowing copy-editing existing text. I agree that a ban on a specific technology is unlikely due to its many uses, but I also think focusing on technology is too limiting. I think the base principle of having text essentially written by human authors is what many editors will support. isaacl (talk) 16:11, 30 July 2023 (UTC)
- I agree that having a general ban on a technology with a variety of uses is not a good idea. Regarding your suggestion: I assume you want to allow LLM usage for spellcheckers. Allowing this while banning copyediting will be a difficult line to draw. Grammarly also includes some basic copyediting functions and would, presumably, also be banned in this case. Phlsph7 (talk) 14:02, 2 August 2023 (UTC)
- As I discussed at Wikipedia talk:Large language models/Archive 5 § Focus on types of uses, I consider spellcheckers and grammar checkers to be analysis tools/features, rather than text generators. Yes, with more and more software integrating text generation features (such as the ones integrated with Microsoft Word), text generated by these features would be prohibited under this principle. isaacl (talk) 14:14, 2 August 2023 (UTC)
US copyright law in the news
There was a recent ruling about AI-generated images in the US. Here are some of the news articles:
- https://www.jpost.com/business-and-innovation/all-news/article-755483
- https://www.theverge.com/2023/8/19/23838458/ai-generated-art-no-copyright-district-court
- https://www.hollywoodreporter.com/business/business-news/ai-works-not-copyrightable-studios-1235570316/
WhatamIdoing (talk) 19:39, 20 August 2023 (UTC)
- "absent any guiding human hand" is an insult to the hundreds of thousands of artists whose work was used to train the algorithm. On the other hand I don't want the copyright owned by the non-artist data engineer/compilation manager. Sandizer (talk) 01:31, 21 August 2023 (UTC)
- I think the point is that there is no human guiding the AI to decide which artists' work to emulate, or how to go about emulating them. (I don't know enough about this particular AI system to know what kinds of datasets it uses.) WhatamIdoing (talk) 03:59, 21 August 2023 (UTC)
Promoting to policy
Since the discussion has largely died down, I think it would be best to hold a RfC to see if it is ready to be promoted into policy. What do y'all think? Ca talk to me! 12:17, 26 August 2023 (UTC)
- I like that idea! Llightex (talk) 14:00, 26 August 2023 (UTC)
Problems with basic guidelines 5 & 8
For the basic guidelines, I think we need to change the following points:
- 5. You must denote that an LLM was used in the edit summary.
- 8. Do not use LLMs to write your talk page or edit summary comments.
The reason is the following: mobile users often use autocompletion features, which are usually enabled by default. Autocompletion features are sometimes based on LLMs. The two guidelines would mean that the affected mobile users would have to declare LLM-use in almost every edit and would not be able to write edit summaries or post comments on talk pages. It would basically keep them from any editing since they can't even write the edit summaries to declare their LLM use. Phlsph7 (talk) 08:41, 28 August 2023 (UTC)
- As previously discussed, this is why I think focusing on technology is the wrong approach. Users have no idea about the specific technology being used by their spellcheck/word suggestion tools, and no one objects to these tools being used to assist editing. I think it would be more effective for any new guideline or policy to address specific use cases for which truly new guidance is required, versus just repeating sections of other guidelines or policies. Providing additional guidance for other guidelines or policies in the context of specific situations can be provided with explanatory essays. isaacl (talk) 16:24, 28 August 2023 (UTC)
- That's an important observation that users are often not aware of the underlying technology they are using. Focusing on specific use cases could solve that problem. But it could be difficult to provide general rules this way, like our basic guidelines. Phlsph7 (talk) 07:20, 29 August 2023 (UTC)
- I think it will be easier to describe general principles when looking at uses, rather than technology. For example, similar to your changes to the nutshell, there could be consensus that programs should not be used to generate text submitted to Wikipedia. isaacl (talk) 16:18, 29 August 2023 (UTC)
- I don't think this is an either-or decision: we can focus on both technology and uses. One quick fix for the problem at hand that implements your idea would be to slightly restrict what we mean by LLM for the purpose of this draft. Currently, it contains the passage
LLMs power many applications, such as AI chatbots and AI search engines. They are used for a growing number of features in common applications, such as word processors, spreadsheets, etc. In this policy, the terms "LLM" and "LLM output" refer to all such programs and applications and their outputs.
we could change it to something likeWhile LLMs power applications with many different functions, this policy covers primarily the use of chatbots and similar external tools used to create and alter text.
- I'm not sure if "chatbots and similar external tools used to create and alter text" is the best formulation to characterize those tools. The term "external" is meant to exclude applications running in the background without the user knowing it. Maybe a footnote could be added. The formulation is intentionally vague and reflects our own ignorance of what those present and future tools might be. It would cover ChatGPT while at the same time excluding mobile autocompletion features.
- I don't know what the prospects of a "technology-free" version of this draft would be since it currently focuses a lot on LLMs. It would probably require extensive revisions to most parts. Phlsph7 (talk) 17:13, 29 August 2023 (UTC)
- I feel the technology is irrelevant to what many editors and readers want: text that is essentially written by humans, not programs. It doesn't matter how any programs being used to assist are coded. isaacl (talk) 17:16, 29 August 2023 (UTC)
- One issue would be that many of the problems discussed here concern primarily LLMs, like hallucinations. Another issue is practical: it would be a lot of work to implement this idea. We might have to start a new draft from scratch. Phlsph7 (talk) 17:26, 29 August 2023 (UTC)
- Yeah, this is my deal, basically. jp×g 05:50, 1 September 2023 (UTC)
- Also: it could well be that we encounter similar problems when trying to describe exactly which uses are ok and which ones aren't. Phlsph7 (talk) 17:28, 29 August 2023 (UTC)
- It doesn't matter why programs may create incorrect facts, or the technology that leads to it. Specific explanations putting matters into context for a specific technology can still exist in explanatory essays. Yes, changes along these lines would require, as previously discussed multiple times, stripping down this proposal to a more barebones version. isaacl (talk) 17:32, 29 August 2023 (UTC)
- re "text that is essentially written by humans, not programs.": Wikipedia:Bots have been a thing since essentially forever. The same goes for tools that assist with editing and patrolling that are human supervised Wikipedia:Tools/Editing_tools. I think the sane thing is that LLMs should be human-supervised for now. --Kim Bruning (talk) 23:09, 29 August 2023 (UTC) though, seeing the progress we've seen in the past year, who knows what the technology will be capable of next year.
- Bots perform specific copy edits; they don't generate new text. If, though, there were a consensus that new text written by programs was acceptable, then again I think a policy or guideline should state this general principle, without referring to the underlying technology, which is subject to rapid change. isaacl (talk) 00:44, 30 August 2023 (UTC)
- This draft is focused on LLMs and even gets its name from them. For a draft that focuses primarily on different forms of uses without particular regard to the underlying technology, it would probably be best to start something new rather than to try to adjust this one. Phlsph7 (talk) 17:26, 30 August 2023 (UTC)
- I'm not suggesting that this page shouldn't exist. I'm only discussing why I feel it would be better to have a policy or guideline based on more general principles, rather than anchoring it to a specific technology, and thus don't support having this page as a policy or guideline. isaacl (talk) 17:31, 30 August 2023 (UTC)
- This draft is focused on LLMs and even gets its name from them. For a draft that focuses primarily on different forms of uses without particular regard to the underlying technology, it would probably be best to start something new rather than to try to adjust this one. Phlsph7 (talk) 17:26, 30 August 2023 (UTC)
- Bots perform specific copy edits; they don't generate new text. If, though, there were a consensus that new text written by programs was acceptable, then again I think a policy or guideline should state this general principle, without referring to the underlying technology, which is subject to rapid change. isaacl (talk) 00:44, 30 August 2023 (UTC)
- One issue would be that many of the problems discussed here concern primarily LLMs, like hallucinations. Another issue is practical: it would be a lot of work to implement this idea. We might have to start a new draft from scratch. Phlsph7 (talk) 17:26, 29 August 2023 (UTC)
- I feel the technology is irrelevant to what many editors and readers want: text that is essentially written by humans, not programs. It doesn't matter how any programs being used to assist are coded. isaacl (talk) 17:16, 29 August 2023 (UTC)
- I don't think this is an either-or decision: we can focus on both technology and uses. One quick fix for the problem at hand that implements your idea would be to slightly restrict what we mean by LLM for the purpose of this draft. Currently, it contains the passage
- I think it will be easier to describe general principles when looking at uses, rather than technology. For example, similar to your changes to the nutshell, there could be consensus that programs should not be used to generate text submitted to Wikipedia. isaacl (talk) 16:18, 29 August 2023 (UTC)
- That's an important observation that users are often not aware of the underlying technology they are using. Focusing on specific use cases could solve that problem. But it could be difficult to provide general rules this way, like our basic guidelines. Phlsph7 (talk) 07:20, 29 August 2023 (UTC)
How to deal with bogus AI citations
A colleague at the place where I work pointed me toward a citation in the "Pileated woodpecker" article which seems to be bogus ("Woodpecker excavations promote tree decay and carbon storage in an old forest"); this citation was added by Filippetr2 back in March. After doing a quick search for it and realizing that the DOI was misassigned and the title did not yield any hits, I removed it. We suspect this was an AI-generated citation. What is the policy for dealing with this sort of thing?--Gen. Quon[Talk] 14:08, 28 August 2023 (UTC)
- At least for citations using cite tags with a DOI or ISBN, we could probably have a bot that checks that those are valid and that the other parts of the cite matches the information the DOI / ISBN points to. Obviously a degree of fuzziness would be needed, and it couldn't just remove them because of that, but it could flag or tag cites with clear-cut issues for review (ie. totally invalid DOI / ISBN, or the data that those point to don't match at all.) That would help with this and would be nice-to-have in general; AI-generated citations would tend to use that format and wouldn't have valid values. Of course, it wouldn't help against a deliberate attacker (who would just remove the DOI / ISBN) but it would help catch people who just don't know any better or other casual problems. --Aquillion (talk) 18:39, 28 August 2023 (UTC)
Policy versus this
It was said a few months ago -- I don't remember if by me or someone else, but I do remember agreeing with it -- that making an RfC for this whole thing to be made a policy was probably going to be a difficult process and its adoption was unlikely, since this page is extremely long and contains a lot of extremely detailed stuff. That is to say, there are about a dozen separate things that could each be subject to their own entire giant page-filling multiple-option RfC. It doesn't seem to me like many people are just going to agree to all of them being instituted as-is with no modification. jp×g 19:48, 29 August 2023 (UTC)
- Perhaps you are referring to Wikipedia talk:Large language models/Archive 5 § Circling back to getting this into a presentable state, or the earlier discussion, Wikipedia talk:Large language models/Archive 4 § Major trim. For better or worse, only a few of the participants on this page have shown an interest in having a slimmer policy. isaacl (talk) 00:56, 30 August 2023 (UTC)
Adding content when one does not know where it comes from.
Adding unverified content generated by a LLM is basically adding content when one does not know where it came from. When we add content from personal knowledge or personal research we take personal responsibility for adding that content, whether it is true, verifiable, misleading, made up or whatever, and the existing guidance and policy covers that. LLMs are just another source of text which may be verifiable, nonsense, misleading, or even occasionally true. The biggest difference is that LLMs are prolific compared to normal humans. The existing Wikipedia policies and guidance are generally effective for content added without having to specify how the editor got the content, because the editor is responsible for their edits. We accept that there is variability in interpretation, some people misunderstand or misrepresent the sources, sometimes the effort to avoid copyright infringement or condense the content distorts the information. Sometimes we just make mistakes. Professional writers also have these problems, and hope that their editors will find the mistakes. We rely on our fellow Wikipedians to edit our work. These are things that make good content creation difficult, and why not everyone is suited to content creation. We deal with it. LLMs just make the same problems more common and on a potentially larger scale. Competence is required, Wikipedia may be the encyclopedia that anyone can edit, but only as long as they follow the rules and are acceptably competent in the type of work they choose to do, and are able to learn and adapt to the environment, and develop into useful members of the community. If they fail to comply with the terms of use or do useful work, they get thrown out. Using content generated by an LLM without checking the product first is like putting a gun to your own head without checking if it is loaded. It is conclusive evidence of incompetence. Cheers, · · · Peter Southwood (talk): 05:47, 31 August 2023 (UTC)
- I like that paragraph and think it is true. jp×g 02:04, 2 September 2023 (UTC)
- I agree that this is an excellent argument against allowing such content at all, and that it does fall under WP:CIR. These programs, and that is all they are, are not competent to write an encyclopedia based on verifiable information. That should be our policy. The fact that people will try it anyway is an invalid argument, as that would also apply to policies like WP:SOCK and WP:PAID. Beeblebrox (talk) 20:23, 6 September 2023 (UTC)
RfC: Is this proposal ready to be promoted?
- The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.
The most common and strongest rationale against promotion (articulated by 12 editors, plus 3 others outside of their !votes) was that existing P&Gs, particularly the policies against vandalism and policies like WP:V and WP:RS, already cover the issues raised in the proposals. 5 editors would ban LLMs outright. 10-ish editors believed that it was either too soon to promote or that there needed to be some form of improvement. On the one hand, several editors believed that the current proposal was too lax; on the other, some editors felt that it was too harsh, with one editor suggesting that Wikipedia should begin to integrate AI or face replacement by encyclopedias that will. (2 editors made a bet that this wouldn't happen.)
Editors who supported promoting to guideline noted that Wikipedia needs to address the use of LLMs and that the perfect should not be the enemy of the good. However, there was no general agreement on what the "perfect" looked like, and other editors pointed out that promoting would make it much harder to revise or deprecate if consensus still failed to develop.
There are several next steps based on the discussion:
- Discuss whether to mark this page as a failed proposal.
- Discuss whether to incorporate portions of the proposal into existing P&Gs.
- Discuss whether to adopt Tamzin's proposal.
- Discuss whether to begin an RfC to ban LLMs.
- Discuss whether to convert the proposal to a supplement or essay.
- Discuss whether to continue revising the proposal, and if so, whether to revise it to be more or less strict regarding the use of LLMs.
As Phlsph7 has noted:
voorts (talk/contributions) 19:19, 14 October 2023 (UTC)
Some of these options can be combined. For example, one could make it into an essay while at the same time adding a few sentences to an existing policy. One difficulty with any step in the direction of a policy/guideline is that the different opinions on the subject are very far apart from each other. This makes it very difficult to arrive at the high level of consensus needed for these types of changes.
Should this proposal in its current form be promoted into policy status?
- Option 1 Promote to policy, preferably to a policy but if not at least to a guideline?
- Option 2 Promote to guideline
- Option 3 Do not promote
Ca talk to me! 00:15, 27 August 2023 (UTC) ; modified 15:30, 27 August 2023 (UTC)
Survey
- Option 2, I am against the use of LLMs but such proposal on this is highly necessary, and overreliance on such models can degrade the quality of articles of people who use LLMs are not bothered to review their LLM contributions. We cannot stop usage of LLMs due to it being difficult to detect, but this proposal definitely pushes the factor that all contributions with LLMs must be properly reviewed before being published. — Karnataka talk 00:04, 28 August 2023 (UTC)
For the mild increase in ease for those who got here via cent: direct link to proposed policy text Nosebagbear (talk)
- Question How does one identify LLM generated text? Without a reliable way to identify it, there is no way to enforce guidance or policy that would be fair or unbiased. · · · Peter Southwood (talk): 02:44, 27 August 2023 (UTC)
- There is no bullet-proof way but various indications can be used. We may assume that many (but not all) editors are honest and tell you if you ask them. There are some obvious phrases used in LLM responses, like "As an AI language model,...". Other indications come from editing behavior. If a new editor adds long and in-depth sections to a series of unrelated articles faster than other editors can read through them, this would raise eyebrows. There are certain content problems associated particularly with LLM output, like making up fake references or historical events. If many edits of an editor showed these problems, that could be used as one indication. There are some automatic AI detectors but they are very unreliable and can probably not used.
- In the end, it would probably be similar to policies covering the usage of multiple accounts: there are various signs that can be used to identify potential policy violations but no absolutely certain procedure that works in all cases. It would have to be decided on a case-by-case basis. If a case is based only on ambiguous indications, there would have to be a lot of strong evidence before sanctioning someone. Phlsph7 (talk) 09:15, 27 August 2023 (UTC)
- Agreed. To add:
- See e.g. here for a good recent overview on how these detectors are extremely unreliable. (And likely to become even less reliable in the future as LLMs continue to improve. An exception might be if OpenAI and others follow through with adding watermarking to e.g. ChatGPT-generated text.)
- To expand on the fake references point: Citation issues are probably one of the most effective ways of spotting problematic ChatGPT-generated content currently. This can manifest in both made-up sources (citations to publications that don't exist) and in citations where the source exists but does not support the statement they are cited for (sometimes an issue with human editors too, but no less problematic in that case). Related to TonyBallioni's point below, these should already be well-covered by existing policies and guidelines (in particular WP:INTEGRITY).
- Regards, HaeB (talk) 07:15, 28 August 2023 (UTC)
- @Pbsouthwood, I tried an automated LLM detector on content I'd written myself, and the results were pretty much a coin-flip. WhatamIdoing (talk) 19:38, 30 August 2023 (UTC)
- Agreed. To add:
- Comment This includes everything from having ChatGPT write a new article about a living person to checking the grammar of an existing article. The former should probably be outright banned (assuming Peter’s question is addressed), while having to disclose the exact version used for the latter is overkill that waters down the seriousness of allowing LLM generated BLPs. ~ Argenti Aertheri(Chat?) 08:28, 27 August 2023 (UTC)
- Don't promote. Even with all these checks and balances it's too early to allow LLMs to write articles. They're a new technology. We should start to consider allowing them after further improvements and iterations, when academic study confirms their general reliability. By the time a human editor has checked LLM generated text with enough thoroughness they could have written the article themselves.—S Marshall T/C 09:43, 27 August 2023 (UTC)
- I fully agree with your sentiment: LLMs are terrible at writing content. However, the proposal says:
Do not publish content on Wikipedia obtained by asking LLMs to write original content or generate references. Even if such content has been heavily edited, seek other alternatives that don't use machine-generated content.
- The proposal sets out to limit exactly that - they shouldn't be used to generate articles. I am not sure where the contradiction is. Ca talk to me! 18:26, 27 August 2023 (UTC)
- Don't promote, or promote to guideline by the time LLMs can create true and factual articles on their own, they would likely be persons and considered editors in their own right. Until then, however, there does not appear to be a reliable way to sift LLM from non-LLM content except by editor content. I would support, however, the requirement to disclose LLM editing similar to paid editing. – John M Wolfson (talk • contribs) 12:34, 27 August 2023 (UTC)
- Don't promote - Once you make something a policy it's very hard to deprecate it if it ends up being problematic, and this is still a very fresh and fast-developing topic. No objections to making it a guideline necessarily. I'd like to know what justification there is for making it either, though; I don't really see that so far. Duly signed, ⛵ WaltClipper -(talk) 13:36, 27 August 2023 (UTC)
- @WaltCip Established editor Ettrig used LLMs non-constructively, and it was evident to everyone that they did so, but they did not disclose this before the fact, refused to do so after the fact, and refused to stop using them, and were indefinitely blocked with the following rationale by Canterbury Tail:
You have been blocked from editing Wikipedia because your contributions are a net negative to the project. It is extremely clear you are using AI chatbots to add very poor quality edits to the project. These edits are resulting in many editors having to spend a lot more time than you spent clicking some buttons to clean up your edits. Because you are solely responsible for your edits, you are solely responsible for the time and effort you're costing other editors to bring your edits up to standard even when they are correct. This all amounts to your edits being a net negative to the project. So while you continue to make such poor quality edits, whether you use AI powered tools or not, and cost other editors time you will not be permitted to edit.
This is just one in a list of incidents involving LLMs. Said block rationale encapulates this proposal, no one has objected to the block, so it seems useful to record this type of good practice on a project page with some explanatory content for those to whom things aren't very clear, for example why the edits arevery poor quality edits
; to many the issue is clear, but to many it isn't. The purpose of WP:PAGs is todescribe best practices, clarify principles, resolve conflicts, and otherwise further our goal of creating a free, reliable encyclopedia
. —Alalch E. 14:01, 27 August 2023 (UTC)- @Alalch E.: I disagree. Consider the block rationales of WP:NOTHERE, WP:CIR, and WP:NOFUCKINGNAZIS and how frequently they are used by administrators to remove editors that most reasonable people can agree are an active, substantive detriment to the project. Those rationales I mentioned are essays. We do not need to codify anything as policy in order to use them as block rationales in the manner that you are describing. I actually think the WP:PAG page is a bit outdated in that it says that essays do not reflect established widespread consensus, and indeed, for many of our most commonly-used essays, there is a strongly held consensus in favor of them. The main resistance to promoting them is because it's unnecessary or superseded by a more generalized policy. In this case,
very poor quality edits
andthe time and effort you're costing other editors
is quite clearly a WP:DE and WP:IDHT issue. Duly signed, ⛵ WaltClipper -(talk) 12:29, 28 August 2023 (UTC) - Incidentally, you will never get a consensus to promote this if the page gets rewritten to indicate a total, sweeping ban on LLM use throughout all of Wikipedia. Duly signed, ⛵ WaltClipper -(talk) 12:34, 28 August 2023 (UTC)
- @Alalch E.: I disagree. Consider the block rationales of WP:NOTHERE, WP:CIR, and WP:NOFUCKINGNAZIS and how frequently they are used by administrators to remove editors that most reasonable people can agree are an active, substantive detriment to the project. Those rationales I mentioned are essays. We do not need to codify anything as policy in order to use them as block rationales in the manner that you are describing. I actually think the WP:PAG page is a bit outdated in that it says that essays do not reflect established widespread consensus, and indeed, for many of our most commonly-used essays, there is a strongly held consensus in favor of them. The main resistance to promoting them is because it's unnecessary or superseded by a more generalized policy. In this case,
- The justification or motivation would be to have it regulated. Currently, there is no regulation. That means that there is no prohibition on any usages of LLMs (except for following other Wikipedia policies). So if people want to copy-paste LLM output into existing or new articles, nothing is stopping them and they don't have to declare it. It's a little bit like the Wild West. Not everything was bad about the Wild West. The question is if we want that. Phlsph7 (talk) 14:12, 27 August 2023 (UTC)
- LLMs are a very powerful swiss army knife. Typically you probably shouldn't directly copy/paste LLM output, I think fairly obviously? But I can think of many trivial exceptions where cut/pasting LLM output directly into a page would definitely not be a problem. It depends on what you're using the LLM for, and how you're using it.
- Of course with great power also comes great danger, as there's nothing stopping you from using an LLM as a footgun either.
- General guidance on how LLMs can be used effectively would be a good idea, but everyone is still out getting practice with them, and there's no real experts I don't think. Conversely, telling people not to use them would be kind of silly. It'd be like telling people calculators or computers were banned. They're already ubiquitous and useful, so of course people will use them.
- A balanced approach is needed. One that allows people to get used to these new tools and how to use them wisely.
- Kim Bruning (talk) 17:05, 27 August 2023 (UTC) "Use your brain, or I'll replace you with a very small LLM!"
It'd be like telling people calculators or computers were banned.
You have no idea how many classrooms I've attended in my young adult life where the teacher or lecturer indeed said this very thing, even for those assignments which would necessarily warrant the use of a calculator or computer. Duly signed, ⛵ WaltClipper -(talk) 14:50, 28 August 2023 (UTC)- That's a very good exercise! It's definitely good to learn to be able to do things without the computer, else how would you know what the computer is doing? I learned these things too: I grew up in a transitional time where computers were only just being introduced, so I actually still learned how to do everything without computers.
- Either way, professionally: normally if tools and equipment are available that make your work easier and faster, it's usually a good idea to use them. (be it eg computers, calculators, cranes, forklifts, powered saws, screwdrivers, robots, etc. ).
- First learn how the tools work, but then by all means: Use the tools, that's what they're for! --Kim Bruning (talk) 23:20, 29 August 2023 (UTC)
- Just to be sure, mere mortals were still doing much by hand in the 80's and 90's. --Kim Bruning (talk) 11:59, 30 August 2023 (UTC)
- @WaltCip Established editor Ettrig used LLMs non-constructively, and it was evident to everyone that they did so, but they did not disclose this before the fact, refused to do so after the fact, and refused to stop using them, and were indefinitely blocked with the following rationale by Canterbury Tail:
- Don't promote - Once you make something a policy it's very hard to deprecate it if it ends up being problematic, and this is still a very fresh and fast-developing topic. No objections to making it a guideline necessarily. I'd like to know what justification there is for making it either, though; I don't really see that so far. Duly signed, ⛵ WaltClipper -(talk) 13:36, 27 August 2023 (UTC)
- It's too soon for me to know how to vote, but Phlsph7's response to Peter feels insufficient to me ... if that's the best answer anyone has to that question, then, even in the best case, making this page a policy page would have really awful consequences. But maybe someone has better data, I don't know. - Dank (push to talk) 13:49, 27 August 2023 (UTC)
- Question: If this page is marked as a failed proposal, how will this affect {{uw-ai1}} (that is in active use and I would say that editors agree that it is a useful warning template), which links to this page in order to explain to users, generally new ones, what the problem is, i.e. what
Text produced by these applications is almost always unsuitable for an encyclopedia
actually means. Serious question.—Alalch E. 14:09, 27 August 2023 (UTC)- Basically the problems are so critical and numerous that they can't all fit into, and can't be stressed strongly enough, in the template itself; a separate, explanatory, page is needed. Why would not this be that page? —Alalch E. 14:11, 27 August 2023 (UTC)
- @Alalch E., there's no requirement that a page be marked as a {{failed proposal}}. Wikipedia:BOLD, revert, discuss cycle has failed proposals at least three times so far, and it's not marked as a failed proposal. WhatamIdoing (talk) 19:58, 30 August 2023 (UTC)
- I see, thanks —Alalch E. 13:33, 1 September 2023 (UTC)
- @Alalch E., there's no requirement that a page be marked as a {{failed proposal}}. Wikipedia:BOLD, revert, discuss cycle has failed proposals at least three times so far, and it's not marked as a failed proposal. WhatamIdoing (talk) 19:58, 30 August 2023 (UTC)
- Basically the problems are so critical and numerous that they can't all fit into, and can't be stressed strongly enough, in the template itself; a separate, explanatory, page is needed. Why would not this be that page? —Alalch E. 14:11, 27 August 2023 (UTC)
You introduced a math problem by splitting those with a "promote" viewpoint into two different possibilities. May I suggest either rewording or agreeing to interpret choice #1 as "Promote, preferably to a policy but if not at least to a guideline? Sincerely, North8000 (talk) 15:21, 27 August 2023 (UTC)
I don't feel the proposed guidance should be given the status of a guideline or policy. I think its focus on a single technology makes it overly specific. I feel that community consensus will be more easily discernable by categorizing software programs/features into types based on use, and gauging community opinion on these types. isaacl (talk) 15:42, 27 August 2023 (UTC)
- Don't promote: LLMs shouldn't be used at all, in any circumstances. As S Marshall said, they're new technology. There are way to many problems, it's easier to just say "No". Edward-Woodrow :) [talk] 16:38, 27 August 2023 (UTC)
- Like the Internet? jp×g 09:05, 1 September 2023 (UTC)
Not ready to be promoted. A bit too premature, and rolling back stuff once it gets an official policy/guideline/essay sticker is tricky (Is there a policies for deletion process yet? :-p ). I mean, for sure, you should not take LLM output as holy writ and immediately paste it into wikipedia as such, that'd be silly. However, it can be hugely useful as a part of the complete editing process though! (I was considering using GPT-4 for translation tasks in particular, where it is vastly superior to eg. gtranslate. It obviously can't do *all* the work, but it can definitely do a lot of the boring bits.). --Kim Bruning (talk) 16:47, 27 August 2023 (UTC)- Even for simple translations, I had instances where ChatGPT hallucinated claims not present in the untranslated text. I wouldn't use it to translate texts unless you are familiar with both languages. At least Google Translate gives a more faithful translation, at the cost of sacrificing flow and coherence. Ca talk to me! 17:28, 27 August 2023 (UTC)
- Precisely. Like many computer-assisted tools (on and off wikipedia), LLMs should not be used unsupervised. All edits should ultimately be the responsibility of a human editor who should take care and know what they are doing. But if an edit *is* the responsibility of a human, I don't see a reason not to allow it. Kim Bruning (talk) 17:51, 27 August 2023 (UTC)
- Option 2 , change to option 2 after discussion and minor edits to improve wording of the guideline.
- Even for simple translations, I had instances where ChatGPT hallucinated claims not present in the untranslated text. I wouldn't use it to translate texts unless you are familiar with both languages. At least Google Translate gives a more faithful translation, at the cost of sacrificing flow and coherence. Ca talk to me! 17:28, 27 August 2023 (UTC)
- Option 2 with the note that this should still be considered open to serious revision and expansion. I can understand some of the concerns above people raise about LLMs, but that makes it more important that we have some sort of guidance, not less; and this discussion does roughly reflect the consensuses and practices in previous discussions. If people want to entirely ban LLMs in all contexts they should try and get consensus to rewrite this page to say that - it has been open for months, they've had plenty of time; there were plenty of other discussions where they had the opportunity to do so; and making this a guideline doesn't prevent them from rewriting it to ban them entirely in the future. But rejecting it for the current situation of "you have to read five different massive discussions to get a sense of consensus on LLMs" isn't workable, especially when the consensus of those discussions broadly doesn't even seem to reflect the objections people are raising here. At the very least, having a clearly-established guideline will provide a central place where further refinements or changes can be discussed and clearly implemented. I don't, however, think that it should be a policy, because too much of this is contextual, because things are developing too fast, and because we do want to be able to continue to update and refine it going forwards - this will be easier if we start with a guideline. Also, jumping straight from "nothing" to "policy" is a difficult ask; we can always promote this to policy in the future if things settle down and there's broader agreement (and if it turns out that it needs to have more teeth), no need to try and start there right away. But it has been months and we need to have at least some guideline; this is too important for us to have nothing at all, and nothing in this document has enough problems to justify total opposition. If the people who want a total ban on LLMs in all contexts think sufficient support exists for that position, they should have a simple up-or-down RFC on it (and ideally should have done so months ago to save us all this trouble); but otherwise, to continue to push for it now, some eight months in after five massive discussions that failed to produce enough support for it, is not a workable position to take. EDIT: Regarding the argument that our current (non-LLM-specific) policies are sufficient, which is one I made myself in one of the previous discussions when this first came up (and which I therefore feel I ought to comment on) - I think it's not entirely wrong, and is another reason for this to just be a guideline rather than a policy, but LLMs come up often enough, and have enough potential for long-term disruption if misused, that it's still useful to have a centralized page we can point people to. The ways in which the implications of our existing policies affect (and in some ways restrict) the use of LLMs might be obvious to experienced editors but won't necessarily be obvious to everyone; to the extent that this page is just a summary of what those implications mean according to the community, it is useful as a formal guideline. In general, I feel that making the implications of policy clear in cases like this is part of what guidelines are for. --Aquillion (talk) 18:01, 27 August 2023 (UTC)
- Option 2 - Guideline: This draft prohibits copy-pasted LLM-generated content while allowing potential constructive uses such as copyediting (with a strict human review requirement) which I believe addresses most of the concerns that have been raised. Even though we don't have a 100% accurate way of identifying LLM output, I see this working similarly to our paid editing policy: Good-faith editors who are aware of the rule will follow it, and those who aren't will often out themselves when asked or disclose in an edit summary. Simply having the rule in place will be valuable in handling these situations. –dlthewave ☎ 18:12, 27 August 2023 (UTC)
- Don't promote existing policies adequately cover everything in here. LLMs aren't as groundbreaking as people think. Surprisingly, when you have sound principles, they withstand technological updates. TonyBallioni (talk) 19:26, 27 August 2023 (UTC)
- Option 1 or 2. The LLM page has cooled off, and I believe it's time to promote it. Nothing's perfect the first time, we'll improve it as time goes on and when new problems arise. This is important guidance to have as we face, though I would much rather support a straight-up ban on LLMs per S Marshall. SWinxy (talk) 21:27, 27 August 2023 (UTC)
- Don't promote.I feel this page is a bit too lenient about LLM use. I feel LLMs have the potential to create low effort, high disruption articles and text, akin in perniciousness to WP:CCIs, and are therefore very dangerous due to the high amount of experienced editor effort it takes to clean them up. When their use is discovered, we should take a firm stance against it. I'd like to see a policy or guideline on LLMs that is short, pretty much bans their use, and allows for reversions and blocks to help quickly clean things up. I am also alarmed at the recent use of LLM on talk pages, such as by POV pushers, and by editors who do not want to take the time to write proper unblock requests, both of which are cases I've seen recently. –Novem Linguae (talk) 07:39, 28 August 2023 (UTC)
- Don't promote, every text initially generated or substantially rewritten by LLMs (no matter if it is subsequently edited by humans or not) should be speedy deleted, and editors warned and then blocked if they continue using this. Way too many problems, and just another example of editors preferring lazy solutions over actually doing the legwork of writing a personally sourced article (section, unblock request, ...) in your own words. Fram (talk) 12:56, 28 August 2023 (UTC)Example of things wrong with this proposal: "All suspected LLM output must be checked for accuracy and is assumed to be fabricated until proven otherwise. LLM models are known to falsify sources such as books, journal articles and web URLs, so be sure to first check that the referenced work actually exists. All factual claims must then be verified against the provided sources." Er, no. One it becomes clear that an edit is probably LLM-based and contains one or two examples of text not matching the sourcing, it is not necessary at all to "check all factual claims against the provided sources", the edit(s) should simply be reversed wholesale. It is not up to others (new page patrollers, recent changes patrollers, ...) to check a long LLM addition bit by bit to find out which bits may be salvageable, this work is up to the one wanting to add such content but not wanting to put in the effort. Shifting the burden of "checking all factual claims against the provided sources" to others is just not acceptable policy- or guideline-language. Fram (talk) 13:02, 28 August 2023 (UTC)
- My reading was that patrollers start with assuming the text is fabricated until proven otherwise. It's up to the original editor to ensure the text is valid. How could this best be reworded? --Kim Bruning (talk) 13:49, 28 August 2023 (UTC)
- I think you are reading this backwards: the idea is not that patrollers must read every single reference before CSDing it/etc, but that authors must do so before publishing the article. But then again, that kind of ambiguity strongly suggests issues with implementing this page as a guideline, so you are correct to object to it. jp×g 09:09, 1 September 2023 (UTC)
- Option 3 per WP:CREEP. The vast majority of the page restates longstanding guidelines, because that's all that is actually needed to oppose harmful LLM content. Even the parts that don't currently mention an existing basis in a POG could do so (e.g, unedited LLM use on talk pages). "Demoting" it to an essay would make it obvious what parts should be cut out or downsized to leave over LLM-specific advice (such as use of the ai-generated template). (Tangential: I would like to see a CSD criterion for pages that are obviously copy/pasted directly from an LLM) Mach61 (talk) 13:22, 28 August 2023 (UTC)
- Don't promote as a policy this has too much overlap; basically a policy doesn't need to say that another policy may be used for things, that other policy can already do that and if its scope is actually changing needs to be changed there (it doesn't appear that this is needed in this case). — xaosflux Talk 14:20, 28 August 2023 (UTC)
- Option 3 - This proposal resembles something designed by committee. Get this down to about half its current size and I might support option 2. I am against point #8: "Do not use LLMs to write your talk page or edit summary comments." This should not be an absolute prohibition. Schierbecker (talk) 05:55, 29 August 2023 (UTC)
- Options 1 and 2 are both fine. Stifle (talk) 12:38, 29 August 2023 (UTC)
- Don't promote as per TonyBallioni. --LukeSurl t c 13:03, 29 August 2023 (UTC)
- Option 3 – don't promote per WP:CREEP. The page explains that this class of technology is increasingly embedded in commonplace tools such as spreadsheets and word processors. Defining the exact scope of the policy would therefore be problematic. Better to remain focussed on existing policies like WP:V and WP:RS. Imaginary sources implicitly violate these already, regardless of how they were generated. Andrew🐉(talk) 19:31, 29 August 2023 (UTC)
- Option 1 We prohibit COPYVIOs and we wouldn't allow anyone to have their employee/ servant edit using the same account. It makes sense that this, too, would be a policy. If we don't take protectionist measures now, our hobby will be out-sourced and then how are you going to seek validation? Chris Troutman (talk) 20:09, 29 August 2023 (UTC)
- Promote to supplement: reading through, I don't really see much in here that isn't covered by existing core policies (V, NPOV, NOR, bot, copyright, CIR and so on), but I can see why it is useful to have somewhere explaining how these policies interact with LLM usage. Under these circumstances "supplement" seems the most appropriate description for the page. – Teratix ₵ 07:42, 31 August 2023 (UTC)
- It may be worth noting that my first draft of this page gave guidelines for LLM use that were exclusively, and explicitly, described in terms of other relevant policies. In fact, this was the whole thing:
Editors who use the output of large language models (LLMs) as an aid in their editing are subject to the policies of Wikipedia, including WP:NOT, WP:NPOV, WP:C, WP:CIVIL, WP:V, and WP:RS. It is a violation of these policies to not follow these policies. This applies to all editors. LLM output should be used only by competent editors who do not indiscriminately paste LLM output into the edit window and press "publish page".
jp×g 09:03, 1 September 2023 (UTC)
- It may be worth noting that my first draft of this page gave guidelines for LLM use that were exclusively, and explicitly, described in terms of other relevant policies. In fact, this was the whole thing:
- Option 1 or 2. Having some basic form of regulation for problematic forms of use is better than no regulation. With no regulation, editors are free to use LLMs as they like. This may negatively affect the quality of articles. Not having a policy or a guideline to rely on makes it difficult for reviewers to react to potentially harmful LLMs-assisted edits and LLM-using editors. I see this proposal as a middle way that addresses the most problematic uses while acknowledging that there are valid uses. Teratix's idea to use the draft as a supplement would also be feasible if options 1 and 2 fail. Phlsph7 (talk) 09:34, 31 August 2023 (UTC)
- Is it really true that if we have no regulation, that editors are free to use LLMs as they like? Or is it only true that editors would not be banned from using LLMs in ways that are indistinguishable from an editor working manually?
- If this doesn't get accepted, both human-based edits and LLM-based edits will still have to comply with all the (thousands) of rules, like not putting false information into articles, not citing made-up sources, not creating more than 25–50 articles per day without express permission, not flooding talk pages with more text than others can reasonably be expected to read, etc. WhatamIdoing (talk) 19:31, 1 September 2023 (UTC)
- I guess we are trying to say the same thing. Currently, there are no LLM-specific regulations. So within the framework of current regulations, editors are free to use LLMs as they like. For example, they can ask an LLM to write a new section or a new article, manually add sources, and publish it. They can also use LLMs to argue for them on talk page content disputes or to convince an admin to lift a ban. Phlsph7 (talk) 17:57, 2 September 2023 (UTC)
- They can do these things – but only up until these things violate one of the other rules. You can ask an LLM to write a new section, but you have to make sure it's accurate. You can use LLMs to argue on a talk page, but you can't engage in Wikipedia:Disruptive editing or violate the Wikipedia:Talk page guidelines. LLMs don't get a free pass; even if LLMs don't get mentioned by name at all, the editor posting the LLM-generated content is still responsible for every bit of that content. The wording in WP:AWBRULES is good: "You are responsible for every edit made." WhatamIdoing (talk) 18:07, 8 September 2023 (UTC)
- I guess we are trying to say the same thing. Currently, there are no LLM-specific regulations. So within the framework of current regulations, editors are free to use LLMs as they like. For example, they can ask an LLM to write a new section or a new article, manually add sources, and publish it. They can also use LLMs to argue for them on talk page content disputes or to convince an admin to lift a ban. Phlsph7 (talk) 17:57, 2 September 2023 (UTC)
- Option 3 / Don't promote per TonyBallioni and xaosflux. Existing policies (which are repeated in WP:LLM itself) already cover the issues here. Some1 (talk) 23:15, 31 August 2023 (UTC)
- Do not promote, unless the 8 rules are removed. I wrote the initial version of this draft, and was involved for some time with the process of building it; in the last few months it has grown into something gigantic and unwieldly. For example, an entire set of rules from different proposals have been lumped into it. Specifically, the rules in the "basic guidance" section seem arbitrary — they're not really supported or explained by the rest of the text in the page. Moreover, the way it's written seems like an attempt to sound stern rather than an attempt to write a policy. For example, the phrase
The use of Wikipedia for experiments or trials is forbidden
is simply not true, we have a gigantic number of experiments and trials running in mainspace at any given time. What I proposed a few months ago was that someone make a vastly trimmed version of this and then propose that as a policy, which I had been intending to do for a while -- this RfC somewhat complicates the issue. What we have is a decent essay, and might even make a decent guideline, if not for the fact that it starts out with eight arbitrary rules (added from a different proposal) that aren't supported by the rest of the text and don't seem like they make a very reasonable or good policy. These are too strict in situations where the use of LLMs cause no problems, and they are not strict enough in situations where they cause huge problems. jp×g 09:03, 1 September 2023 (UTC) - Option 3 (do not promote). This is a confused mess of vague guidelines, attempts at rules, background information, and repetition of other policies and guidelines. Thryduulf (talk) 11:32, 1 September 2023 (UTC)
- Don't promote for now, as it could use some improvements, and most rules already cover it. After improving the page, and LLMs really become a problem, I would say promote to guideline. ~~2NumForIce (speak|edits) 21:03, 1 September 2023 (UTC)
- Don't promote with a second choice of promote to guideline. This doesn't feel at all like policy, so to me that's out. And while I view LLMs as a threat to the reliability of Wikipedia (they can do do a fine job creating bogus articles that look good with low effort), I think the general theme is "you are responsible for what you write" and I don't see how we can enforce something like this. I do think we need some kind of a guideline, I'm just not sure this is it. Hobit (talk) 21:24, 1 September 2023 (UTC)
- Don't promote I don't think the community has really had time to come to a consensus on this and therefore do not feel this proposed policy represents what the community as a whole wants. I'm not saying it is wrong, just too soon. My personal preference would be for a very short policy that makes it clear that you need to do so much to insure that AI-generated material is good enough that you may as well do the work yourself. I also feel that it is basically dishonest, as you are passing off work you did not generate as your own. Beeblebrox (talk) 16:39, 2 September 2023 (UTC)
- Explanatory supplement, but not policy/guideline: I would like to see this as more than an essay to show that it has some official standing. However, the best parts of this essay are not those that aim to introduce new rules, but those that explain the application of particular rules to LLMs (e.g. that raw LLM content inherently fails verifiability). Per WP:CREEP it is ideal to minimise the number of LLM-specific rules we need to introduce. — Bilorv (talk) 18:53, 2 September 2023 (UTC
- Explanatory supplement as I agree that the page is very low on new normative substance and indeed explains existing policies and ongoing practices. I think that the "Risks and relevant policies" section is useful and makes various important points, including those concerning copyright which isn't clear to many people. It's also very natural to place a reference to the {{uw-ai1}} series of warning templates on some page, and, on the other hand, connect those templates to a page which explains the problem to new users, and it appears to me that this should be that page, so at least those two things are good. If tagged as an explanatory essay, the rule-resembling content that goes beyond what can be put on such a page would then be removed, which is fine.—Alalch E. 14:11, 3 September 2023 (UTC)
- Don't promote I know I'm likely beating the dead horse since consensus appears to be against promoting but I'm concerned that if this became policy it would be used to continually attack editors. Are you having a disagreement with an editor? Accuse them of using ChatGPT. Don't like an edit someone made? Say they used a LLM. At this point I'm satisfied that we can address LLM abuses using our current guidelines. What matters most is whether an editor is aiding or harming Wikipedia, not the tools they use.--SouthernNights (talk) 13:03, 5 September 2023 (UTC)
- Option 3. I do think that a policy regarding LLMs should exist; however, I believe that the current draft is too prohibitive on LLM use and overly pessimistic. Frostly (talk) 20:05, 5 September 2023 (UTC)
- Option 3: Do Not Promote - It's too long, it's unclear, and parts of it overlap in scope with other policies thus reinventing those wheels. The problem is the garbage that LLMs spew, when they spew garbage. We already have well-written policies to prevent and dispose of garbage. Working on an LLM policy draft is a waste of time better spent on designing the next generation of Wikipedia, collaborating with machine learning developers and working with AI tools directly to integrate AI into Wikipedia and its various interfaces and activities: searching, browsing, editing, categorizing, scripting, discussing, helping, and beyond. Keep in mind that we are in the middle of an AI technological revolution. If Wikipedia doesn't become the go to intelligent encyclopedia you can converse with in any language, then it will be obsoleted by a brand new AI encyclopedia, or AI app with encyclopedic scope, that will take Wikipedia's place as the world's encyclopedia. My guess is that this will happen within 6 years from now, perhaps in as little as 2. See: Accelerating change. — The Transhumanist 10:32, 6 September 2023 (UTC)
- I would bet real life money against that happening. Mach61 (talk) 23:09, 11 September 2023 (UTC)
- I'll take your money now! Duly signed, ⛵ WaltClipper -(talk) 12:30, 12 September 2023 (UTC)
- I would bet real life money against that happening. Mach61 (talk) 23:09, 11 September 2023 (UTC)
- Reduce to three sentences and make guideline: The use of large language models (LLMs) for any task on Wikipedia is discouraged. Editors must take full responsibility for any edits they make with the assistance of LLMs. Misuse of LLMs may lead to blocks from editing and the deletion of pages as hoaxes or vandalism. -- Tamzin[cetacean needed] (she|they|xe) 06:36, 7 September 2023 (UTC)
- Strongly support this 3-sentence proposal, though I'd make it a policy. DFlhb (talk) 06:43, 7 September 2023 (UTC)
- Perhaps it could be genericized slightly and become part of an existing policy. It might fit into either Wikipedia:Editing policy or Wikipedia:Bot policy. The fact is that we hold editors responsible for their edits no matter what tools they are (or aren't) using, even if they're not the ones using them. WhatamIdoing (talk) 18:13, 8 September 2023 (UTC)
- I !voted for tagging as an explanatory essay, but this would also be good. —Alalch E. 16:25, 7 September 2023 (UTC)
- I'd prefer something closer to 8-10 sentences, but this works well too. Mach61 (talk) 19:25, 8 September 2023 (UTC)
- Option 3, or use the three-sentence proposal as a guideline. I think we should broadly discourage LLMs, but recognize that they offer accessibility benefits to less verbal thinkers, and those individuals will be more well versed in the shortcomings and risks with LLMs than your average casual user. As such, the detailed proposal at this point seems like it will overfit and both allow too much not of benefit, and disallow too much of benefit. A short warning about not misusing a relatively new tool, and affirming the editor's responsibility seems like the right level of guideline here. —siroχo 20:22, 18 September 2023 (UTC
- This suggestion of Tamzin's pretty much says all that actually needs to be said. The rest is already covered by existing policy and guidance. If we keep this essay, which is fine by me, everything that is claimed as fact should be linked to the relevant policy or guidance page.· · · Peter Southwood (talk): 11:57, 12 October 2023 (UTC)
- Strongly support this 3-sentence proposal, though I'd make it a policy. DFlhb (talk) 06:43, 7 September 2023 (UTC)
- Option 2 While the current form overlaps with existing policy, it provides a clear explanation of the risks of LLMs. Though recent cases of AI-generated content were already resolved with editing blocks under current policies, the appeals show that the offenders are often unable to recognize why such editing is disruptive in their focus on efficient content generation. I would suggest standardizing the disclosure statement, such as ending edit summaries with "LLM-assisted with GPT-3.5" so that such edits can be quickly sorted among user contributions. BluePenguin18 🐧 ( 💬 ) 19:12, 9 September 2023 (UTC)
- Clearly the offenders wouldn't recognize why such editing is disruptive even if WP:LLM as a policy clearly outlined explicitly why it is wrong and disruptive. As WP:CREEP itself explains, there's a recurring issue on Wikipedia in that
nobody reads the directions from beginning to end
; and those same editors who don't understand why LLM-assisted editing is problematic would have just as much difficulty understanding no matter what policy you cite at them. And really, if a LLM-using editor's response to being blocked is to ask ChatGPT to write their unblock rationale for them, they have problems beyond that of just using the LLM in the first place. Duly signed, ⛵ WaltClipper -(talk) 16:11, 11 September 2023 (UTC)
- Clearly the offenders wouldn't recognize why such editing is disruptive even if WP:LLM as a policy clearly outlined explicitly why it is wrong and disruptive. As WP:CREEP itself explains, there's a recurring issue on Wikipedia in that
- Option 2 - This is written as a guideline rather than a policy, which is probably as it should be. Yaris678 (talk) 19:25, 11 September 2023 (UTC)
- Option 3/don't promote not as a policy, guideline, or explanatory supplement either. The reasons have been explained in detail by other voters already, in short: Wikipedia already has way too many polices/guidelines/supplements/etc. and no new ones should be created unless it's really, really necessary; this isn't necessary, as everything there is to say in terms of rules about LLMs (which boils down to the general rules, "use reliable sources" and "don't copy and paste text from another website onto Wikipedia") is already said in other existing policies and guidelines; and I disagree with a lot of the "rules" or advice in this proposed policy/guideline. I don't really think Wikipedia needs even an essay about the dangers of LLMs anymore than it needs an essay about the dangers of word processors or thesauruses or any other writing tool. I could see maybe throwing in a sentence about copy/pasting from LLMs in WP:E and a sentence about LLMs not being an RS in WP:RS, but that's about all that's needed. Levivich (talk) 21:49, 13 September 2023 (UTC)
- Option 2. Policy is a bit scary/too early IMHO. But as a guideline, I think it is reasonable enough. There are some good points here that should be stressed as 'best practices', and I think that's what we call 'guidelines' around here anyway. --Piotr Konieczny aka Prokonsul Piotrus| reply here 08:34, 15 September 2023 (UTC)
- Option 2. Policy is too early and is somewhat WP:TOOSOON for policy. I see that there are good points here besides Wikipedia:Wikipedia Signpost/2023-09-16/News and notes where there is a section where an editor using ChatGPT gets indeffed. I am personally against using LLMs for writing Wikipedia articles, as they usually count as WP:OR and also I've seen an editor write an apology with ChatGPT. There's a good idea to make a guideline similarly to AI generated art. -- Wesoree (talk·contribs) 14:58, 18 September 2023 (UTC)
- Don't promote I was one of the contributors to this draft, and expected it to attract more contributors, maybe from WP:VPI, so that it could grow and reflect wider community views. That's not happened, and I see that as an indicator that there's insufficient interest, or insufficient consensus, to have a fleshed-out policy/guideline on LLMs. In other words, that it's too soon. I also don't like the section on "Handling suspected LLM-generated content", because raw LLM output should be removed on sight, not tagged or
verified against the provided sources
.
- Also, we're ignoring the main threat: people misusing LLMs in ways that are hard to detect. We need machine learning tools that check our text against the cited source and detects mismatches. Meta made Side AI (available here), which is okay, but we need a more systematic equivalent, reviewing every addition, whose accuracy can be trained and improved through in-the-field use. Only AI can help us fight AI effectively. DFlhb (talk) 15:45, 18 September 2023 (UTC)
- Option 3 (don't promote). I agree with the view that our existing policies and guidelines already cover the situations described by this proposed policy. This page might serve well as a kind of {{information page}} that explains how our existing policies and guidelines apply to specific situations, but it does not need to be a policy or a guideline in and of itself. (See WP:CREEP for why the creation of unnecessary policies and guidelines may be harmful.) Mz7 (talk) 01:43, 19 September 2023 (UTC)
- Option 3 I don't want to see AI text being used here, until it's been proven elsewhere... It's hard enough to patrol what we have coming in now. AI just isn't ready for primetime yet; I suppose if you want to submit a draft article and have some way to check the sources, fine... Seems like a waste of time. Just write the article yourself. Oaktree b (talk) 20:26, 20 September 2023 (UTC)
- Option 3 I think it's inappropriate to use AI text on WP, except for some other situations like copyediting. My main concern is that it would be difficult to find the exact sources from which LLMs get their information. XtraJovial (talk • contribs) 22:06, 25 September 2023 (UTC)
- Don't promote. I don't think this is ready to be promoted to a guideline, let alone a policy. For instance, point 7 is very vaguely worded and would appear to forbid even experiments making read-only use of Wikipedia content, in total contradiction to the way our content is licensed (and would appear to prevent us from even attempting to find out what the capabilities of these systems are, such as the tests reported below in #LLM Experiment - Sources and unsourced information helper). Point 8 would appear to forbid non-native-English speakers getting any automated assistance with talk-page discussions. Point 5 is ambiguously worded: used where? Used to write the edit summary, or something else? Point 2 is advice for what people should think, not a rule about how to edit Wikipedia or how not to; it is totally unenforceable. Etc etc. Badly edited rules make for bad guidance. —David Eppstein (talk) 06:36, 30 September 2023 (UTC)
- Comment. I've requested a formal close at WP:CR —siroχo 04:35, 8 October 2023 (UTC)
- Don't promote, covered by existing policy and guidance. I am not persuaded that it will serve the intended purpose. · · · Peter Southwood (talk): 12:03, 12 October 2023 (UTC)
Discussion
What's the current situation? What happens if we don't promote this? A lot of people above are opposing based on the argument that it's too early to allow LLMs to used as a part of writing articles in any context, even with the requirement for specific-competence human checks. But my understanding is that the current situation is that we have no guidance and must decide on a case-by-case basis, which is not ideal either; and that this proposal would therefore tighten the (currently ad-hoc but mostly nonexistent) practices regarding LLMs. That is, my reading is that while people can reasonably argue that we should have a stricter guidance, opposing this without being able to get a consensus for something stricter would, as a practical matter, result in looser restrictions on LLMs. --Aquillion (talk) 17:39, 27 August 2023 (UTC)
- If this isn't promoted the status quo is retained which means that the same decisions as those that were made in response to LLM-related incidents linked at the top of this talk page will keep being made, and the decisions have been good so far, so we're fine. There is already a good practice going on, but it isn't recorded anywhere. This proposal is intended to capture this existing good practice as well as it can (this comment is strongly related to my first comment in the RfC, so please read both). —Alalch E. 17:47, 27 August 2023 (UTC)
- Starting an RFC prematurely is a great way to kill great essays. --Kim Bruning (talk) 17:52, 27 August 2023 (UTC)
- It has been eight months. Discussion has long since died down. LLMs are being used now, constantly, on articles, with no real guideline or policy beyond a vague framework of consenuses from previous discussions, all of which lacked a formal close; and they're going to continue to be used in an ad-hoc manner until / unless we can agree on at least some sort of guidelines. We need something. Smaller problems can still be refined going forwards. If people oppose this on a fundamental level then they need either a clear alternative that can gain consensus (and they need to actually, you know, go through the process to demonstrate a broad consensus for that alternative), or they need a reasonably specific actionable objection that can be fixed. Without one of those things, opposition amounts to "let's not even have a guideline for LLMs and let everyone do as they will", which is not workable (and doesn't even reflect what any of the opposes above actually want!) --Aquillion (talk) 18:11, 27 August 2023 (UTC)
- (and it also does not reflect on the following: how editors have been able to tell that someone is being irrational with their time through ill-advised LLM use by shifting the burden of ensuring policy compliance onto them, how recent changes patrollers have made reverts of non-compliant LLM-generated changes and posted {{uw-ai1}} warnings on user pages, how new page patrollers have been able to make a correct decision about draftifying or tagging or nominating LLM-generated pages for deletion, how administrators have so far been able to make obviously helpful and noncontroversial decisions about blocking LLM-misusers etc. All of this has been happening in real time, but many editors are not aware that it has been happening. It will keep happening irrespective of the outcome of this process; we can have a page on which these practices are recorded or we can have no such page, but it won't fundamentally change the practices) —Alalch E. 18:27, 27 August 2023 (UTC)
LLMs are being used now ... and they're going to continue to be used in an ad-hoc manner until / unless we can agree on at least some sort of guidelines.
I submit to you that those people who use LLMs in a patently disruptive fashion are not going to care one way or another if there's a policy, a guideline, a Wikimedia Foundation T&C clause, or an edict from Jimbo Wales prohibiting them from using LLMs. Duly signed, ⛵ WaltClipper -(talk) 14:48, 28 August 2023 (UTC)- Earlier this year I nominated a few LLM-generated articles for AfD or speedy deletion and there was quite a bit of confusion among experienced editors and admins over how to handle them. There still isn’t a lot of awareness of just how bad LLM content can be, especially the “vaguely plausible bullshit”, fabricated references and the fact that rewriting or copyediting isn’t a viable option. Even if we’re not actually making any new rules, it would be really helpful to have a page explaining how and why our existing policies apply so that we can address the issue in a consistent manner. –dlthewave ☎ 15:13, 28 August 2023 (UTC)
- Yeah, I remember this: nobody wanted to expand G3 to cover fake references because they weren't obvious enough. It didn't make any sense to me, but who knows. jp×g 05:51, 1 September 2023 (UTC)
- Because CSDs must be obvious. Fake references that are sufficiently obvious are already covered by G3 and/or other CSD criteria, ones that are not sufficiently are unsuitable for speedy deletion. Other policies apply in the exact same way to human and LLM-generated content - if it's so bad it can't be fixed it should be deleted, if it can be fixed it should be fixed. Thryduulf (talk) 11:36, 1 September 2023 (UTC)
- Well, what I mean is this: Aluminum is harder than tungsten[1][2][3][4].
- Hoersfelder, J., et al. "Comparative Analysis of Tungsten and Aluminum Alloys for Aerospace Applications." Journal of Materials Science Advances, 2022, 45(7), 1234-1245.
- Chen, L. , et al. "Nanostructural Characterization of Tungsten-Aluminum Composite Materials." Annals of Materials Research Letters, 2023, 34(9), 789-798.
- Brown, M. R., et al. "Hardness Evaluation of Tungsten-Aluminum Alloys for High-Temperature Applications." Journal of Advanced Materials Engineering, 2021, 56(3), 567-576.
- Wang, S., et al. "Mechanical Properties and Microstructural Analysis of Tungsten and Aluminum-Based Composites." International Journal for Materials Science and Engineering, 2022, 40(8), 1123-1131.
- These are "non-obvious" in that they are formatted like real citations, but "obvious" in that literally all of them are completely made up (i.e. the journals do not even exist, let alone the articles). It seems like common sense that, if you know this, it then becomes extremely obvious that it's a hoax. There's no way to fix fictitious references to fictitious journals, so I would say this is G3, but consensus disagreed. jp×g 01:24, 2 September 2023 (UTC)
- CSD is, by design, explicitly only for "the most obvious cases", which means that in practice if someone disagrees in good faith that a given page meets the criterion then it does not. This is a Good Thing, because speedy deletion avoids all the usual checks against errors (made in good and bad faith). Generally if a reviewer can't verify a rationale from reading the article and/or with at most 1-2 clicks of provided links then it is not obvious enough for speedy deletion. Thryduulf (talk) 18:24, 2 September 2023 (UTC)
- Well, what I mean is this: Aluminum is harder than tungsten[1][2][3][4].
- Because CSDs must be obvious. Fake references that are sufficiently obvious are already covered by G3 and/or other CSD criteria, ones that are not sufficiently are unsuitable for speedy deletion. Other policies apply in the exact same way to human and LLM-generated content - if it's so bad it can't be fixed it should be deleted, if it can be fixed it should be fixed. Thryduulf (talk) 11:36, 1 September 2023 (UTC)
- Yeah, I remember this: nobody wanted to expand G3 to cover fake references because they weren't obvious enough. It didn't make any sense to me, but who knows. jp×g 05:51, 1 September 2023 (UTC)
- Earlier this year I nominated a few LLM-generated articles for AfD or speedy deletion and there was quite a bit of confusion among experienced editors and admins over how to handle them. There still isn’t a lot of awareness of just how bad LLM content can be, especially the “vaguely plausible bullshit”, fabricated references and the fact that rewriting or copyediting isn’t a viable option. Even if we’re not actually making any new rules, it would be really helpful to have a page explaining how and why our existing policies apply so that we can address the issue in a consistent manner. –dlthewave ☎ 15:13, 28 August 2023 (UTC)
- It has been eight months. Discussion has long since died down. LLMs are being used now, constantly, on articles, with no real guideline or policy beyond a vague framework of consenuses from previous discussions, all of which lacked a formal close; and they're going to continue to be used in an ad-hoc manner until / unless we can agree on at least some sort of guidelines. We need something. Smaller problems can still be refined going forwards. If people oppose this on a fundamental level then they need either a clear alternative that can gain consensus (and they need to actually, you know, go through the process to demonstrate a broad consensus for that alternative), or they need a reasonably specific actionable objection that can be fixed. Without one of those things, opposition amounts to "let's not even have a guideline for LLMs and let everyone do as they will", which is not workable (and doesn't even reflect what any of the opposes above actually want!) --Aquillion (talk) 18:11, 27 August 2023 (UTC)
- Starting an RFC prematurely is a great way to kill great essays. --Kim Bruning (talk) 17:52, 27 August 2023 (UTC)
Considering that the outcome of this discussion is going to result in this essay being tagged as a "failed policy/guideline nomination", would it be better to suspend the outcome of the above RfC indefinitely until the wording of the page is changed in such a manner that might eventually yield a consensus to promote? Or am I thinking too bureaucratically here? It's evident that the effort to nominate this for WP:PAG was felt to be a bit rushed. Duly signed, ⛵ WaltClipper -(talk) 23:27, 10 September 2023 (UTC)
- There is no requirement to add “failed proposal”, see WT:BRD. Considering the amount of votes asking explicitly to convert the page to an essay, the fate of the page should be clarified after this RFC closes. Mach61 (talk) 00:51, 11 September 2023 (UTC)
Rethinking the nutshell
This page in a nutshell: Use of large language models (LLMs) must be rigorously scrutinized, and only editors with substantial prior experience in the intended task are trusted to use them constructively. Repeated LLM misuse is a form of disruptive editing. |
We’re seeing a lot of !votes from editors who think the draft would allow LLM-created articles, which couldn’t be further from the truth. I think we need to change the nutshell and intro to make it very clear that LLM content generation is prohibited and editing is only allowed under certain circumstances. Of course we shouldn’t make any big changes mid-RfC, but a rewrite of the nutshell shouldn’t be a problem. –dlthewave ☎ 14:39, 28 August 2023 (UTC)
- If you're rethinking the nutshell, you're essentially rethinking the essay, I would think. It sounds like a consensus needs to be reached through a separate discussion about what the page means before we even discuss an RfC, let alone its outcome. Duly signed, ⛵ WaltClipper -(talk) 14:51, 28 August 2023 (UTC)
- Not exactly. The nutshell is descriptive, not prescriptive; it was written to summarize an earlier version of the draft and was never updated to reflect the current body after changes were made. –dlthewave ☎ 15:27, 28 August 2023 (UTC)
- How about the following as a nutshell:
Do not use large language models (LLMs) to write original content or generate references. LLMs can be used for certain tasks (like copyediting, summarization, and paraphrasing) if the editor has substantial prior experience in the intended task and rigorously scrutinizes the results before publishing them.
Phlsph7 (talk) 07:30, 29 August 2023 (UTC)- I like that, it's a lot better. —Alalch E. 08:45, 29 August 2023 (UTC)
- That's definitely a lot better, though I still think having to disclose the version number a bit silly. ~ Argenti Aertheri(Chat?) 09:35, 29 August 2023 (UTC)
- I implemented the suggestion. I also made the version disclosure optional. Some LLM providers may not provide this information and even for the ones that do, many regular users may not know how to acquire this information. Phlsph7 (talk) 10:47, 29 August 2023 (UTC)
- Thank you! Much better. –dlthewave ☎ 15:42, 29 August 2023 (UTC)
- I implemented the suggestion. I also made the version disclosure optional. Some LLM providers may not provide this information and even for the ones that do, many regular users may not know how to acquire this information. Phlsph7 (talk) 10:47, 29 August 2023 (UTC)
- How about the following as a nutshell:
- Not exactly. The nutshell is descriptive, not prescriptive; it was written to summarize an earlier version of the draft and was never updated to reflect the current body after changes were made. –dlthewave ☎ 15:27, 28 August 2023 (UTC)
- Not sure where I need to put this (move if desirable) but deciding to reject the policy/guideline/whatever is still a decision, and in my opinion, a worse one than having a policy we don't all perfectly agree on. LLM-generated content is already being used on Wikipedia and that isn't acceptable for numerous reasons, so we need to move now on having a policy and tweak it later. Stifle (talk) 08:56, 31 August 2023 (UTC)
- PS most if not all LLM-generated content would be liable to deletion as a probable copyvio given that most LLM sources suck up a lot of stuff that's not CC-BY-SA compatible. Stifle (talk) 08:58, 31 August 2023 (UTC)
- This is not true. jp×g 09:12, 1 September 2023 (UTC)
- What part? Stifle (talk) 14:35, 5 September 2023 (UTC)
- This is not true. jp×g 09:12, 1 September 2023 (UTC)
- You have created a false dilemma by stating that the only way to control LLM-generated content on Wikipedia is to have a policy implemented, however imperfect it is. It's evident that Wikipedia's policies and guidelines are already being used in order to strike down garbage or phony content. Creating a policy that does not reflect widespread consensus, on the other hand, would lead to even greater conflict between users trying to interpret something they may or may not agree with. Duly signed, ⛵ WaltClipper -(talk) 16:06, 2 September 2023 (UTC)
- PS most if not all LLM-generated content would be liable to deletion as a probable copyvio given that most LLM sources suck up a lot of stuff that's not CC-BY-SA compatible. Stifle (talk) 08:58, 31 August 2023 (UTC)
What’s to be done?
This RFC is going to be a massive pain for whoever closes it, because while there’s an overwhelming consensus that the page shouldn’t be promoted, and a rougher consensus there should be something done with the WP:LLM page rather than just marking it as failed and archiving, it’s not clear what people want. Normally, this would just lead to a “no consensus on what to do next, status quo remains” close, but seeing as how editors have failed to get this page to a satisfactory level for most after eight months of work, I doubt anyone would be happy with anything less than a complete rework. Thoughts? Mach61 (talk) 23:25, 11 September 2023 (UTC)
- (No insult intended to anyone who worked here) Mach61 (talk) 23:26, 11 September 2023 (UTC)
- I think the biggest mistake was making one big page that attempts to cover everything. A better approach would be to write the actual policy/guideline as a general principle instead of a set of specific rules: Something like “LLMs may not be used to write original content in article or talk space. They may be only be used for editing tasks such as copyediting, summarizing and paraphrasing.” This could be a standalone policy or added to an existing policy page. Most of the content on this page could be retained as an explanatory supplement. –dlthewave ☎ 01:35, 12 September 2023 (UTC)
- There seem to be several ways forward. A few suggestions so far were:
- make it an essay
- make it an explanatory supplement
- modify an existing policy by adding a few sentences
- make certain changes to the draft or make a new draft and propose it as a policy/guideline, for example, by
- making it more strict
- making it less strict
- making it much shorter
- Some of these options can be combined. For example, one could make it into an essay while at the same time adding a few sentences to an existing policy. One difficulty with any step in the direction of a policy/guideline is that the different opinions on the subject are very far apart from each other. This makes it very difficult to arrive at the high level of consensus needed for these types of changes. Phlsph7 (talk) 08:11, 12 September 2023 (UTC)
- There seem to be several ways forward. A few suggestions so far were:
- I think the biggest mistake was making one big page that attempts to cover everything. A better approach would be to write the actual policy/guideline as a general principle instead of a set of specific rules: Something like “LLMs may not be used to write original content in article or talk space. They may be only be used for editing tasks such as copyediting, summarizing and paraphrasing.” This could be a standalone policy or added to an existing policy page. Most of the content on this page could be retained as an explanatory supplement. –dlthewave ☎ 01:35, 12 September 2023 (UTC)
- I think that we should perhaps come up with some simple one-sentence summaries, and hold an RFC on those. Once we have agreement on what the overall thrust of our guidelines should be, we can continue from there. Also, just starting as an essay (which doesn't require consensus) might be a good idea; many essays end up with the force of guidelines or even policy due to broad acceptance, and my feeling is that unless someone writes a competing essay with an alternative formulation, that is what would happen here. So turning this into an essay (and tweaking the wording a bit to make it clear it's not authoritative, as well as, when possible, more clearly citing and describe the implications of existing policy rather than trying to set it own) might be a good idea - anyone who has objections to this would then be goaded into writing down their alternatives in a concrete format, and we'll eventually end up with clear practices about LLMs; a year or two down the line we can then come back, figure out which essays have actually become supported practice, and consider either promoting them or using them as a guide for updating guideline / policy pages. This isn't ideal but the consensus isn't here and the people who say that our existing policies are at least holding at the moment are not wrong even if it makes it hard to answer LLM-specific questions. I'd rather have a guideline now but the process of essay -> common practice -> guideline or policy is well-established and is another way to eventually reach a consensus; if it's useful and functional as an essay people will use it until it's clear that it ought to be promoted, and if not someone will probably write an alternative (or we'll realize we don't need one.) --Aquillion (talk) 15:38, 13 September 2023 (UTC)
Paragraph with unclear purpose
Equally, raw LLM outputs must not be pasted directly into drafts or articles. Drafts are works in progress and their initial versions often fall short of the standard required for articles. Enabling editors to develop article content by starting from an unaltered LLM-outputted initial version is not one of the purposes of draft space or user space.
@Dlthewave: What is this meant to convey? I do not really understand what benefit is obtained by having this as a guideline/policy. Do any of our policies on drafts say anything about requiring specific referencing or formatting or whatever in the very first revision? This, for example, was the first revision of GPT-2, which nobody has given me any guff about in the last three years. jp×g 01:29, 2 September 2023 (UTC)
- I agreed with your removal of the copyright concerns but thought the draft space restriction was worth keeping. We don't want people copy-pasting LLM output into draft space and trying to reformat it as an article, since this is very prone to errors, but I agree that we're saying a lot more than we need to here.
- I won't lie, "A computer that's also a friend" made me smile, but it was utterly useless as a starting point for an article and probably would have been deleted on sight if someone had stumbled across it before you replaced it with sourced content. –dlthewave ☎ 03:31, 2 September 2023 (UTC)
- Well, that's what I mean, it was just random whatever because I hadn't moved it to mainspace yet. I am not a very prolific draftspace patroller, but my impression was that drafts could just have whatever garbage in them as long as it was decent by the time you moved it to mainspace (and as long as it wasn't
asldk;fjaweo;iajdsadfs piSS P[I-SS COMING FROIM My aSS
). I mean, if there's some widespread phenomenon of people posting huge amounts of raw GPT puke into draftspace, I guess we should write something to deal with that; is that happening? jp×g 05:48, 2 September 2023 (UTC)- I concur that I don't think the paragraph is appropriate as written. It's not clear what "raw LLM output" means. Nowa (talk) 01:24, 22 September 2023 (UTC)
- Does anyone object to me removing the paragraph? Nowa (talk) 00:38, 9 October 2023 (UTC)
- I concur that I don't think the paragraph is appropriate as written. It's not clear what "raw LLM output" means. Nowa (talk) 01:24, 22 September 2023 (UTC)
- Well, that's what I mean, it was just random whatever because I hadn't moved it to mainspace yet. I am not a very prolific draftspace patroller, but my impression was that drafts could just have whatever garbage in them as long as it was decent by the time you moved it to mainspace (and as long as it wasn't