Jump to content

File:Combining GPT-4 and stable diffusion to generate art from sketches.png

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
From Wikipedia, the free encyclopedia

Original file (2,961 × 1,294 pixels, file size: 2 MB, MIME type: image/png)

Summary

Description
English: "Possible application in sketch generation" of GPT-4.

"Text-to-image synthesis models have been widely explored in recent years, but they often suffer from a lack of spatial understanding capabilities and the inability to follow complex instructions [GPN+22]. For example, given a prompt such as “draw a blue circle on the left and a red triangle on the right”, these models may produce images that are visually appealing but do not match the desired layout or colors. On the other hand, GPT-4 can generate code from a prompt, which can be rendered as an image, in a way that is true to the instructions to a higher degree of accuracy. However, the quality of the rendered image is usually very low. Here, we explore the possibility of combining GPT-4 and existing image synthesis models by using the GPT-4 output as the sketch. As shown in Figure 2.8, this approach can produce images that have better quality and follow the instructions more closely than either model alone. We believe that this is a promising direction for leveraging the strengths of both GPT-4 and existing image synthesis models. It can also be viewed as a first example of giving GPT-4 access to tools, a topic we explore in much more depth in Section"

"Prompt: A screenshot of a city-building game in 3D. The screenshot is showing a terrain where there is a river from left to right, there is a desert with a pyramid below the river, and a city with many highrises above the river. The bottom of the screen has 4 buttons with the color green, blue, brown, and red respectively"
Date
Source https://arxiv.org/abs/2303.12712
Author Authors of the study: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang (all at Microsoft Research)

Licensing

w:en:Creative Commons
attribution
This file is licensed under the Creative Commons Attribution 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Captions

From the preprint "Sparks of Artificial General Intelligence: Early experiments with GPT-4"

Items portrayed in this file

depicts

22 March 2023

image/png

ec49f5701e081f4bd544deb4d83659886e883e53

2,102,136 byte

1,294 pixel

2,961 pixel

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current14:11, 8 May 2023Thumbnail for version as of 14:11, 8 May 20232,961 × 1,294 (2 MB)PrototyperspectiveUploaded a work by Authors of the study: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang (all at Microsoft Research) from https://arxiv.org/abs/2303.12712 with UploadWizard

The following page uses this file: