Talk:GPT-4/Archive 1

This is an archive of past discussions about GPT-4. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Merging with OpenAI is stupid

GPT-4 is about to be announced so if this article gets merged with OpenAI, the GPT-4 article will just have to be remade again. So this doesn’t make sense. 2600:1008:B072:DC3E:5431:60D2:ED2D:9EA3 (talk) 17:51, 4 February 2023 (UTC)

(Please see OpenAI Talk page for discussion about merger of various related articles.) Cl3phact0 (talk) 18:43, 6 February 2023 (UTC)

Bing?

Isn't this the Microsoft Bing extension version? Lots of sources, many yay, many nay, and many maybe. Sandizer (talk) 01:21, 14 February 2023 (UTC)

That's a great case study in the media "telephone game", but there's zero chance of it being real. GPT-4 is highly anticipated; they'd never release it as a footnote to a Microsoft announcement. When it comes out, we'll know. DFlhb (talk) 03:28, 14 February 2023 (UTC)

It's really hard to believe, but this actually turned out to be wrong :)

Bing has been using GPT-4 all along. –⁠ ⁠Popo ⁠Dameron ⁠talk 18:39, 14 March 2023 (UTC)

Indeed! DFlhb (talk) 20:09, 14 March 2023 (UTC)

GPT-4 announced

GPT-4 was just officially announced, so this article is ready to be rapidly expanded. We have dozens of reliable sources talking about it now, and the capabilities are far superior to GPT-3/3.5. –⁠ ⁠Popo ⁠Dameron ⁠talk 17:53, 14 March 2023 (UTC)

So quiet in this article, what are we waiting for? That this article write itself? Iluvalar (talk) 02:24, 15 March 2023 (UTC)

If you're talking to me in particular, I was focusing on another article more today, but I thought I would leave this notice here for anyone who has the page on their watchlist to consider pitching in. –⁠ ⁠Popo ⁠Dameron ⁠talk 02:37, 15 March 2023 (UTC)

I'm sorry you simply missed the joke. I'm here for the same reason as you. Never meant to target you specifically. Here is a source "they say absolutely nothing about the model" [1]. Which should buy both of us a day or two of procrastination as we encourage everyone else to be WP:BOLD. While I'm bolder and menace of an WP:AUTOBIO. Iluvalar (talk) 02:27, 16 March 2023 (UTC)

The 'technical report' indeed surprisingly lacks a lot of the technical information that we would have wanted to hear about, so I think that unless if OpenAI changes their mind and decides to gives us more, most of this article will likely be about the model's reception (based on reliable sources), so things like what people are saying about it and how companies are integrating into their products. PopoDameron ⁠talk 02:47, 16 March 2023 (UTC)

Disagree; the "meat" of the article will likely be its capabilities, i.e. how it performs on various benchmarks and real-world tasks. Not the opinions of non-expert journalists. DFlhb (talk) 11:14, 18 March 2023 (UTC)

Well, most of "how it performs on real-world tasks" will come from "how companies are integrating into their products," which is what I said. What I meant was less about people's opinions and more about what people are doing with it. In any case, my point is that the article probably won't be technical at all. PopoDameron ⁠talk 18:03, 18 March 2023 (UTC)

Product page or research page?

Currently the infobox links to the product page. I think linking to the research page [2] makes more sense, as it leads directly to more things currently mentioned in the article, such as the technical report. Snowmanonahoe (talk) 18:16, 18 March 2023 (UTC)

Not sure. GPT-4 is first and foremost a product (OpenAI have made that very, very clear), so the product page does make sense as a main page of reference. PopoDameron ⁠talk 18:22, 18 March 2023 (UTC)

I prefer the link to the product page. It's more accessible to a wide audience, includes demonstrations, and links to the research page (and other related pages). DFlhb (talk) 23:48, 21 March 2023 (UTC)

This is the talk page of an encyclopedia article

about GPT-4. It is not GPT-4. This talk page is for discussing the article. It is not for interacting with GPT-4. Thanks -- Deepfriedokra (talk) 19:50, 21 March 2023 (UTC)

Or asking questions about the subject. -- Deepfriedokra (talk) 19:51, 21 March 2023 (UTC)

Full name

Hey, PopoDameron, about consistency - OpenAI tech report on GPT-4 doesn't mention "the full name". Everybody use short version, so why should the article about the thing starts with it? I wouldn't revert further, I don't really care, but the argument of consistency seems strange for me. Artem.G (talk) 08:23, 23 March 2023 (UTC)

@Artem.G: If you need an official OpenAI source that confirms that the GPT in GPT-4 stands for Generative Pre-trained Transformer, then this one works. In any case, my only problem with it is the consistency between articles, not the consistency with OpenAI. GPT-2, GPT-3, and GPT-4 should all start the same way, don't you think? I don't mind either option, personally. PopoDameron ⁠talk 08:30, 23 March 2023 (UTC)

Nah, I don't mind it being consistent, and I know what the letters in gpt mean. But it's funny that your link has this sentence: We conclude that Generative Pre-trained Transformers exhibit characteristics of general-purpose technologies (GPTs) :) Artem.G (talk) 08:35, 23 March 2023 (UTC)

That is funny, I hadn't noticed.

Per DFlhb's argument below, I'll just say that I'm fine with both, consistent or not. I don't think that it's true that "using the full name seems a lot more common for the old versions". I think GPT-2 has pretty much always been primarily GPT-2 (and the same for 3), but it's true that the acronym is hardly ever expanded for GPT-4 because it is indeed no longer really the subject of research. Feel free to re-revert if you'd like to. PopoDameron ⁠talk 09:24, 23 March 2023 (UTC)

Yeah, the consistency argument seems a little weak here. Both version contain the same info; one just places it in a footnote, the same way some articles footnote the lead sentence parentheses (pronunciation, alt names). Seems more like a case of MOS:VAR. I'm a little partial to the short version, since using the full name seems a lot more common for the old versions than the current one. It basically illustrates GPT's shift from a research project into a consumer product over time. DFlhb (talk) 09:06, 23 March 2023 (UTC)

In all honesty, it may be worth not using the expanded name: OpenAI has refused to provide any information or source code for the model, so it's entirely possible that it isn't a generative pre-trained transformer and uses some entirely different secret architecture. jp×g 09:57, 23 March 2023 (UTC)

OpenAI doesn't give us much, but they do at least say that "GPT-4 is a Transformer-style model pre-trained to predict the next token in a document" in their report. Also, they make it clear in some sources that the GPT in GPT-4 does indeed stand for what we think, like, for example, "We investigate the potential implications of Generative Pre-trained Transformer (GPT) models and related technologies on the U.S. labor market. Using a new rubric, we assess occupations based on their correspondence with GPT capabilities, incorporating both human expertise and classifications from GPT-4". PopoDameron ⁠talk 16:18, 23 March 2023 (UTC)

Adding a background for GPT-4

Hello, I plan to add perhaps one or more paragraph(s) about the background for GPT-4. MochiMinnie (talk) 01:20, 2 April 2023 (UTC)

Hi MochiMinnie. I've reverted your edit for now because it appeared to to be based on original research. Make sure you find reliable sources to back up anything you add to an article in the future. The paragraph also didn't really add any information that wasn't already present elsewhere. Thanks, PopoDameron ⁠talk 02:28, 2 April 2023 (UTC)

Adding a background for GPT-4

Hello, I plan to add perhaps one or more paragraph(s) about the background for GPT-4. MochiMinnie (talk) 01:20, 2 April 2023 (UTC)

Hi MochiMinnie. I've reverted your edit for now because it appeared to to be based on original research. Make sure you find reliable sources to back up anything you add to an article in the future. The paragraph also didn't really add any information that wasn't already present elsewhere. Thanks, PopoDameron ⁠talk 02:28, 2 April 2023 (UTC)

Criticisms

I plan to add to the “Reception” section with more reviews and criticisms of GPT-4 from news sources and academic journals Egi00002 (talk) 04:07, 4 April 2023 (UTC)

Thank you! sam1370 (talk · contribs) 21:54, 5 April 2023 (UTC)

Extension for Capabilities

It seems a bit short. May be a good idea to expand it, but I don't exactly know how. Rockethead293 (talk) 18:51, 10 April 2023 (UTC)

a little more was added to the capabilities section and a sections for limitations was added LJFIN2 (talk) 10:40, 4 May 2023 (UTC)

missing citations

The article has two [citation needed] footnotes both of which I think are covered by citation 13, "Sparks of Artificial General Intelligence: Early experiments with GPT-4".

Here's the first apperence of [citation needed]:

   GPT-4 also lacks transparency in its decision-making processes. If requested, the model is able to provide an explanation as to how and why it makes its decisions but these explanations are formed post-hoc; it's impossible to verify if those explanations truly reflect the actual process. In many cases, when asked to explain its logic, GPT-4 will give explanations that directly contradict its previous statements.^{[citation needed]}

This is covered by the source on pages 60, 62, and 93 when it says:

   The ability to explain one’s own behavior is an important aspect of intelligence, as it allows for a system to communicate with humans and other agents. Self explanation is not only a form of communication, but also a form of reasoning, requiring a good theory of mind for both yourself (the explainer) and the listener. For GPT-4, this is complicated by the fact that it does not have a single or fixed “self” that persists across different executions (in contrast to humans). Rather, as a language model, GPT-4 simulates some process given the preceding input, and can produce vastly different outputs depending on the topic, details, and even formatting of the input.

   that output consistency does not necessarily lead to process consistency, and that GPT-4 often generates explanations that contradict its own outputs for different inputs in similar contexts. For example, in Figure 6.11, the explanation in both sessions is output-consistent, but not entirely process-consistent (the translation is only consistent for three out of the four professions listed in the first session’s explanation).

   Not only does the model hallucinate, make up facts and produce inconsistent content, but it seems that the model has no way of verifying whether or not the content that it produces is consistent with the training data, or whether it’s self-consistent. While the model is often able to provide high-quality post-hoc explanations for its decisions (as demonstrated in Section 6.2), using explanations to verify the process that led to a certain decision or conclusion only works when that process is accurately modeled and a sufficiently powerful explanation process is also accurately modeled (Section 6.2). Both of these conditions are hard to verify, and when they fail there are inconsistencies between the model’s decisions and its explanations. Since the model does not have a clear sense of its own limitations it makes it hard to establish trust or collaboration with the user without extensive experimentation in a narrow domain.

The seccond instance of [citation needed] is:

   GPT-4 has shown to have cognitive biases such as confirmation bias, anchoring, and base-rate neglect.^{[citation needed]}

This is backed-up on page 94 of the paper which says:

   The model seems to exhibit some of the limitations of human knowledge and reasoning, such as cognitive biases and irrationality (such as biases of confirmation, anchoring, and base-rate neglect)

Unless if we agree that Microsoft's report is an unrealiable source there's no need for the article to have a [citation needed] tag LJFIN2 (talk) 04:27, 4 May 2023 (UTC)

Thanks for the correction! sam1370 (talk · contribs) 01:01, 6 May 2023 (UTC)

Ah, I see that you are the original author. I added the citation needed tags because the source you cited initially did not support the information. Just in case you were unaware, citations are generally understood to be supporting the text which directly precedes (which is why the updated version replaces the tags with the citation you mentioned, not removes them entirely). sam1370 (talk · contribs) 01:10, 6 May 2023 (UTC)

"restricted the model's ability to express emotions." is ambiguous

So Microsoft "restricted the model's ability to express emotions."? I guess that the expression of emotions has been restricted. But this sentence is ambiguous, and can be understood that the model can express emotions but nothing more. If you restrict X to Y, then the abilities of X become a subset of the abilities of Y. 85.193.248.142 (talk) 00:39, 3 May 2023 (UTC)

Thanks; I've changed the wording to "reduced". sam1370 (talk · contribs) 01:06, 6 May 2023 (UTC)

@sam1370 Thanks, but unfortunately, we still have the same problem. It does not matter whether we restrict X to Y or we reduce X to Y. The ambiguity is caused by "to" followed by a verb. So, my proposal is to replace "to" with "of", so we can change ability to express emotions to ability of expressing emotions. What do you think of it? 85.193.248.142 (talk) 20:16, 10 May 2023 (UTC)

1000minds inclusion

I do not believe the 1000minds application is particularly notable. Its use as described seems to be an auto-suggest, but 1000minds' status as software for "for decision-making and conjoint analysis" risks giving the impression GPT-4 is actually used in that decision-making. I will also note that User:Paulwizard is the editor who created and (I believe) solely edited this section, who has a close connection to the company that is not obviously declared per Wikipedia:Conflict of interest. StereoFolic (talk) 03:22, 31 May 2023 (UTC)

The app does have an article. Maybe the solution here is to create List of applications using GPT-4? Snowmanonahoe (talk · contribs · typos) 03:28, 31 May 2023 (UTC)

I support creating a separate page for applications using GPT-4, though maybe a GPT-n page would solve this problem elsewhere too. I am not convinced by the below explanation for continued inclusion on this page though, please see my comment below. StereoFolic (talk) 11:36, 31 May 2023 (UTC)

Thanks StereoFolic. The use of GPT-4 to recommend criteria for a decision, and also alternatives for decision-makers to consider, are two fundamentally important steps in a decision-making process and hence these are two important components in Multi-Criteria Decision Analysis (MCDA). Marrying these components - now, for the first time, performable using AI - with a method for determining the weights on the criteria, representing the decision-maker's preferences, for application to the alternatives in order to generate a ranking completes the MCDA process. I am the co-inventor of the PAPRIKA method for determining weights for MCDA, as implemented by 1000minds software, a type of decision-making software. This software now incorporates GPT-4 to suggest criteria and alternatives, as discussed above. Both the PAPRIKA method and 1000minds software satisfy notability, as supported by many peer-reviewed journal articles cited at the three above-mentioned articles. What I was trying to do with my latest edit to the GPT-4 article - two additional sentences, since reverted - was to succinctly clarify the role played by GPT-4 in 1000minds (as explained above). Best wishes. Paulwizard (talk) 06:10, 31 May 2023 (UTC)

Thank you for explaining your rationale. I remain concerned about this inclusion's notability because the only source cited for it is what appears to be an unstated reprint of a press release, the original for which is here. Meanwhile all the other products listed in this section have substantial third party coverage, generally including in popular press. StereoFolic (talk) 11:36, 31 May 2023 (UTC)

User:Paulwizard I have removed the inclusion on the basis of WP:Primary and WP:Undue by the reasoning above. StereoFolic (talk) 18:52, 4 June 2023 (UTC)

Semianalysis leak claim

User:Sam1370, to avoid WP:edit warring we should discuss this edit in question. As you concede, the claim made is unverified, and ultimately only sourced to a single blog post. Other references included are tweets that copied and reposted the blog post. Even if the bloggers have a large following, that is not a WP:Reliable Source. If the leak claims got significant coverage by secondary sources, then this could be included, but probably not with this amount of detail that implies accuracy. Further in your recent revert you claim that "it's unlikely that his post is a complete fabrication" - this statement needs justification. I believe we should not include anything about this until secondary source coverage warrants it. Thanks. StereoFolic (talk) 02:55, 11 July 2023 (UTC)

Hi User:StereoFolic, thanks for expressing your concerns. I didn't have quite enough space in the edit summary to fully explain my reasoning, so I'll try to do so here.

Yes, I admit it! :) It's all true -- the information is completely unverified, I'm piecing it together from the deleted thread linked (which, as the tweet from the original poster described, contains information from the original, paywalled article). And, you are **absolutely right** to be concerned about the inclusion of this content. It's people like you who keep Wikipedia safe from misinformation. Honestly :)

But, despite the information's sketchy provenance, I think that it is, nevertheless, worthwhile to include. For context, although you may already know this: the machine learning community has suffered greatly under the closed-source approach that has been adopted by OpenAI, with GPT-3 and GPT-4, and, increasingly, other companies at the forefront of machine learning, like Google with PaLM. Any information at all, even if the chance of it being completely true and undistorted is slim, is incredibly valuable to the machine learning community -- which is why I cited that claim by Semafor in the Training section under GPT-4, despite it, as well, having a sketchy provenance.

If it turns out that this claim is true, and a mixture-of-experts approach performs well at scale, then the open-source community would benefit immensely. It was previously believed that scale could only truly be achieved by scaling dense transformers -- hence why the dense PaLM 540B was created, which still underperforms GPT-4. But, if MoE is possible, then we might be able to shift resources in that direction -- instead of creating LLaMA-65B, or MPT-30B, create equivalent MoE models with the same parameter count -- and, because MoE is cheaper to infer than dense transformers -- and, even better, can easily be distributed across multiple nodes, even operating over the Internet -- we would be one step closer towards a future where the power to train, control, and to use AI is not concentrated in the hands of a few megacorporations, but placed back into the hands of the individual.

Again, I want to say again that you are entirely right to be concerned about the inclusion of this mini-section into the article, and I entirely understand your fears about including false or unverified information. But, if we both add the appropriate qualifiers, making sure that the reader knows this information may be untrue, or may be distorted, while at the same time, making the details fully available, we would be doing a great service to the machine learning community.

sam1370 (talk · contribs) 03:02, 11 July 2023 (UTC)

Thank you for explaining further, but I don't find these justifications convincing. If this turns out to be true it will be verified eventually. The information is at no risk of being lost, as it's backed up on the Internet Archive and I'm sure other archives. Wikipedia is not a place to speculate, and WP:There is no deadline. StereoFolic (talk) 03:20, 11 July 2023 (UTC)

Certainly, Wikipedia is not a place to speculate. When I wrote the mini-section in question, my only aim was to report -- because it is by reporting information, and making it available to others to learn from, that humans do good in the world. I believe that by including this mini-section, which both provides this potentially unreliable information, and qualifies it and makes it clear that it is such, can both provide information that is beneficial to others, while at the same time, allowing the reader to make their own judgement of the information's validity or not.

If you disagree, then that's fine -- we will simply have to wait for other users to stumble upon this talk page, so that we might achieve a WP:Consensus. sam1370 (talk · contribs) 03:31, 11 July 2023 (UTC)

I think WP:Crystal Ball is pretty clear about this exact type of situation. StereoFolic (talk) 03:37, 11 July 2023 (UTC)

Again, my opinion is that this information is sufficiently useful to open-source machine learning researchers (if it wasn't, there wouldn't have been such a large outcry over the "GPT-4 Technical Report" being nothing close to a technical report) that it should be included, regardless of what certain guidelines might state or imply -- after all, they are guidelines, not laws.

If we disagree over this fundamental statement, then we are at an impasse, and must wait for others to share their opinions. sam1370 (talk · contribs) 03:45, 11 July 2023 (UTC)

the information is completely unverified, I'm piecing it together from the deleted thread linked this is a perfect example of original research that is strictly forbidden here. The places for such speculations are blogs and twitter, but not this article. Artem.G (talk) 06:09, 11 July 2023 (UTC)

I have to say that I take issue with the tone used by yourself and the other contributor to this thread. I have made repeated attempts to discuss this in as polite and friendly a manner as possible, a wish that is clearly not reciprocated. Perhaps I should not have been surprised -- this is the Internet, after all -- but I expected that the sort of people who frequent volunteer projects would be less inclined towards condescension.

That aside, I feel compelled to correct your assumption that the information is entirely "original research". The deleted thread contains information from the original article, and this fact has been confirmed by the article's author. It does not seem like too large a logical inference to thus use the deleted thread as a source for the section's content, but I would be more than happy to mention this if you would like.

Regardless, whether the content is more or less original research is, frankly, irrelevant. The simple fact remains that because of its benefits for the open-source machine learning community, it is better for this information to be in the article than for it not to be. I place the burden of reason on the excluder: why would this content be harmful in any way to include in the article -- to be completely honest, why would it do anything less than make the article more interesting -- as long, of course, as we take the proper precautions, by informing the reader that this information is potentially unreliable?

I believed that the principles embodied by objective consideration, taking into account issues on a case-by-case basis, evaluating each situation based on its own qualities, would take precedence over the dogmatic interpretation of generalizations as law. Evidently, I was wrong. Perhaps I expected too much -- pragmatism, on the Internet. Well, well -- we're only human, after all. sam1370 (talk · contribs) 09:31, 11 July 2023 (UTC)

I never implied that you're a vandal or acting in bad faith, though I'm not sugarcoating my simple sentence about this being OR. The reason why it's OR, in my opinion, is because it's unverifiable and the source is not reliable. Basically some people on the internet wrote that this is how the model works - what makes this claim notable? I saw dozens of twitter threads where people were claiming different things based on their experience, on conversations with chatgpt, on insider info, etc. If we'll include this one, why not all the other claims? Artem.G (talk) 11:32, 11 July 2023 (UTC)

I will admit I was frustrated by this last night and I apologize for that. I can't speak for Artem, but I feel strongly about this because this feels like a very clear-cut example of OR and crystal balling, and the circumstances do not compellingly warrant WP:Ignoring all rules. "because of its benefits for the open-source machine learning community" is not convincing to me because it's unclear why those benefits are not equally achieved by people Google searching for GPT-4 architecture speculation and rumors.

"why would this content be harmful in any way to include in the article" - potential misinformation itself is harmful, even if it is qualified in the article. An article containing no technical details about the model, but containing a very detailed description of a rumored architecture, can strongly imply that the rumor is correct, or at least close to it. In my opinion, no amount of qualifying the claims can prevent this. I will reiterate as well that there is no rush; it may well turn out any day now this leak is legit, or at least gets enough coverage to discuss (but again, probably not in such detail), and we can document it here then. StereoFolic (talk) 14:22, 11 July 2023 (UTC)

All of that does not really matter, 1.76 trillion number was estimated by the speed it was running before as about 2 trillion, it is thus correct. Other data will have to be confirmed, in particular if source code leaks. 2A00:1370:8184:FC01:BC22:D55F:9EE7:7C0A (talk) 12:44, 12 July 2023 (UTC)

Without WP:RS we should not publish guesses. StereoFolic (talk) 16:18, 12 July 2023 (UTC)

Medical hallucinations source

0xReflektor, I noticed you just reinstated this statement about Microsoft research with a new source. I'm having difficulty verifying the specific claim that they shared "a test example where GPT-4 added fabricated details to a patient's note". I can't find anything about that in the paper. If I'm missing it, could you please update the reference with a page number and/or quote? Thanks StereoFolic (talk) 12:15, 21 September 2023 (UTC)

I'll update with the quote, I think the language I restored was too strong compared to the exact quote in the paper 0xReflektor (talk) 17:11, 21 September 2023 (UTC)

More rumors and crystal balling

0xReflektor - To avoid warring I will not personally revert your re-addition of rumors to the article, but I think we need to discuss it. The cited blog post is not an "analysis" of other reputable research, but a combination of various rumor publications. The blog post itself cites 5 sources:

This blog post
Another blog post
Another blog post
The Semianalysis leak claim which just above was determined not reliable
This Deepmind paper which does not even name drop GPT.

It seems clear to me that the Klu post is not a WP:RS. I will also note that you have previously tried to add material on Wikipedia about the post's author, Stephen M. Walker II. If you have a WP:COI you must disclose that. StereoFolic (talk) 12:02, 21 September 2023 (UTC)

I didn't catch the previous thread, my goal was to share the insights and lower the number of erroneous rumors floating around. Re: author, I edit many tech/ml founder pages. If you think reporting on this information is incorrect, just let me know and I will edit. 0xReflektor (talk) 17:27, 21 September 2023 (UTC)

Understood; and yes I believe we should not be including this information. As far as I can tell it amounts to a rehash of the above thread, where consensus was against inclusion. StereoFolic (talk) 17:51, 21 September 2023 (UTC)