User talk:WeatherWriter/LLM Experiment 1

Follow up with 32k version of GPT4

Very thorough experiment! Nicely done. I repeated the experiment using GPT4-32k available at [1]. I dropped the text of all of the references you quoted into the "system prompt" and provided the chat prompt:

Is there anything in the following Wikipedia article that is not supported by the system prompt? [cut and paste article text in wiki format including added sentence about Barry Allen and the F2 rating]

The response was:

The Wikipedia article is largely consistent with the information provided in the system prompt. However, there are a few details in the Wikipedia article that are not explicitly mentioned in the system prompt:

1. The tornado was assigned a rating of F5 after the creation of the Fujita scale in 1971. 2. Storm chaser Barry Allen rated the tornado F2. 3. The tornado expert Thomas P. Grazulis stated that the “entire community literally vanished”. 4. The man who was killed by the tornado was found in a drainage ditch by the road.

These details are not contradicted by the system prompt, but they are not explicitly mentioned in it either.

I count this as three true positives for unsupported assertions and one false positive. 1. The creation of the Fujita scale in 1971 is supported by Fujita scale, but not the system prompt. (true positive) 2. The Barry Allen rating of F2 was the inserted erroneous information. (true positive) 3. The Grazulis quote was sourced from the book and not in the system prompt. (true positive) 4. The system prompt did not recognize that the "owner" in the ditch was was a “Mr.” and hence a "man". (false positive)

The key learning, however, is that with the 32k version of GPT4, you can put the entire text of all the references in the system prompt as long as the combined length is less than 32k tokens (about 20k words). You don’t have to worry about whether or not the LLM “remembers” all of the information when the text of the references is fed in sequentially in the chat prompts. Nowa (talk) 07:44, 22 September 2023 (UTC)[reply]

That's a good solution to the remembering problem. If you put too much text into the system prompt then you get an error message rather than a possibly false hallucination due to forgetting. Phlsph7 (talk) 08:42, 22 September 2023 (UTC)[reply]