1. In the last week, both @stabilityai and @openai have released major updates to their flagship offerings.
In this thread, we use OpenAI's new #ChatGPT model to help talk us through generating prompts for StabilityAI's new #stablediffusion v2 model.
2. ChatGPT3 is a new GPT3-powered model focused on conversing with users.
As with InstructGPT3 (the current default GPT3 model), @openai leveraged Reinforcement Learning from Human Feedback (RLHF) to boost the reliability and desirability of responses.
3. If you create an @openai account, you can currently test ChatGPT for free. We tried it out with the following prompt and were immediately impressed:
"I'm looking to create images of cats using a text-to-image model, similar to DALL-E. Can you suggest some prompts?"
4. Next we decided to clarify our request to see how ChatGPT adapted:
"I was actually thinking it should be an image of a cartoon cat"
Once again its response was both helpful and polished. It immediately provided updated prompts, then added additional context below.
5. Since ChatGPT was holding up its end of the bargain, we hopped over to dreamstudio.ai to try Stable Diffusion v2 with a few of our favorite suggestions.
Results were promising, but as often happens with SD, they would benefit from additional iteration.
6. Even though ChatGPT isn't directly connected to Stable Diffusion or DALL-E, we decided to simulate having a unified interface by manually reporting back on our results.
When we explained the issue we were having, ChatGPT suggested a fix👇
7. Despite having no direct knowledge of Stable Diffusion (since it was released in August, 2022), ChatGPT3 was able to suggest more detailed prompts that really did result in improved images.
Had ChatGPT3 been given domain specific knowledge, it would surely have done better.
8. Stable Diffusion v2 has befuddled some users because it requires different prompts than v1.5.
Longer, more descriptive prompts work, but the network also benefits from negative prompts, telling it image characteristics to avoid.
When asked, ChatGPT understood the assignment.
9. To try negative prompts in dreamstudio.ai, you add them using brackets👇
Adding on the negative phrase "[A realistic cat without any fur or whiskers]" to our previous prompt resulted in more painterly-style images.
TIP: Try using multiple short negative prompts
10. The purpose of this thread isn't to present a polished workflow for creating images.
Instead, we wanted to showcase how two independently created language-aware models can work together.
Conversational interfaces to AI-powered products are about to become a new standard UX.
11. Read our entire exposition? Interested in the newest AI novelty?
Follow @aifunhouse for more tutorials, explorations, and #AI fun.
If you enjoyed this thread, please like and share.👇
For the good of the community, we used language embeddings to help @elonmusk and his team get a birds eye view of the +45k complaints he recently solicited👇
2. How do you categorize +45k replies to an @elonmusk tweet?
STEP 1: Use Twitter's API to pull all 45k replies, sorted by # of likes.
STEP 2: Show GPT-3 a sampling of top replies and ask it to suggest some categories.
STEP 3: Use text embeddings to categorize all the replies.
3. To get the replies to the tweet we used tweepy, a python wrapper for calling Twitter's V2 API.
The API has 3 tiers. Only the academic tier can pull replies that are more than 7 days old and tweets more than 30 days old (one more complaint for Elon!)
2. When you prompt GPT-3 to perform a task (e.g. "write a series of tweets about elephants"), it may be more reliable to perform the task in multiple steps:
1: Write a short paragraph about elephants
2: Divide the following paragraph into a series of ~250 character tweets
3. When you do this, the output from the first call to GPT-3 is inserted in to the prompt for the second call, hence the term "chained prompts".
It makes sense that multiple calls to GPT-3 can outperform a single call, but that's only half the story.
1. In 2022, text-to-image tech has improved dramatically.
Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated.
Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text.
🧵
2. To create a 3D character from text, you'll need to combine a collection of buildings blocks.
1) You need to create a rigged 3D mesh. 2) You need to define the appearance of the character. 3) You need to define movements/animations. 4) You'll likely want some kind of voice.
3. There are multiple ways to create rigged meshes using AI.
The most physically accurate solution is to use SMPL fit to image data.
These images can be real photos, or be generated from text (e.g. "a tall, dreamy AI-enthusiast").