aifunhouse Profile picture
Nov 30 11 tweets 7 min read
🧵AI-assisted image prompts

1. In the last week, both @stabilityai and @openai have released major updates to their flagship offerings.

In this thread, we use OpenAI's new #ChatGPT model to help talk us through generating prompts for StabilityAI's new #stablediffusion v2 model.
2. ChatGPT3 is a new GPT3-powered model focused on conversing with users.

As with InstructGPT3 (the current default GPT3 model), @openai leveraged Reinforcement Learning from Human Feedback (RLHF) to boost the reliability and desirability of responses.

openai.com/blog/chatgpt/
3. If you create an @openai account, you can currently test ChatGPT for free. We tried it out with the following prompt and were immediately impressed:

"I'm looking to create images of cats using a text-to-image model, similar to DALL-E. Can you suggest some prompts?"
4. Next we decided to clarify our request to see how ChatGPT adapted:

"I was actually thinking it should be an image of a cartoon cat"

Once again its response was both helpful and polished. It immediately provided updated prompts, then added additional context below.
5. Since ChatGPT was holding up its end of the bargain, we hopped over to dreamstudio.ai to try Stable Diffusion v2 with a few of our favorite suggestions.

Results were promising, but as often happens with SD, they would benefit from additional iteration.
6. Even though ChatGPT isn't directly connected to Stable Diffusion or DALL-E, we decided to simulate having a unified interface by manually reporting back on our results.

When we explained the issue we were having, ChatGPT suggested a fix👇
7. Despite having no direct knowledge of Stable Diffusion (since it was released in August, 2022), ChatGPT3 was able to suggest more detailed prompts that really did result in improved images.

Had ChatGPT3 been given domain specific knowledge, it would surely have done better.
8. Stable Diffusion v2 has befuddled some users because it requires different prompts than v1.5.

Longer, more descriptive prompts work, but the network also benefits from negative prompts, telling it image characteristics to avoid.

When asked, ChatGPT understood the assignment.
9. To try negative prompts in dreamstudio.ai, you add them using brackets👇

Adding on the negative phrase "[A realistic cat without any fur or whiskers]" to our previous prompt resulted in more painterly-style images.

TIP: Try using multiple short negative prompts
10. The purpose of this thread isn't to present a polished workflow for creating images.

Instead, we wanted to showcase how two independently created language-aware models can work together.

Conversational interfaces to AI-powered products are about to become a new standard UX.
11. Read our entire exposition? Interested in the newest AI novelty?

Follow @aifunhouse for more tutorials, explorations, and #AI fun.

If you enjoyed this thread, please like and share.👇

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with aifunhouse

aifunhouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @aifunhouse

Nov 11
🧵Categorizing complaints to Elon with AI

1. It's a tumultuous time here in tweet town.

For the good of the community, we used language embeddings to help @elonmusk and his team get a birds eye view of the +45k complaints he recently solicited👇

2. How do you categorize +45k replies to an @elonmusk tweet?

STEP 1: Use Twitter's API to pull all 45k replies, sorted by # of likes.

STEP 2: Show GPT-3 a sampling of top replies and ask it to suggest some categories.

STEP 3: Use text embeddings to categorize all the replies.
3. To get the replies to the tweet we used tweepy, a python wrapper for calling Twitter's V2 API.

The API has 3 tiers. Only the academic tier can pull replies that are more than 7 days old and tweets more than 30 days old (one more complaint for Elon!)

kirenz.com/post/2021-12-1…
Read 11 tweets
Nov 6
🧵Automated image generation with DALL-E

1. A few days ago @OpenAI announced that #DALLE users could create images programmatically via API.

To celebrate we took a vintage shot of @eerac, circa 2012, and through in a healthy assortment of fully automated AI whimsey. ImageImageImageImage
2. Until two days ago, DALL-E users have only ever been able to generate images through @OpenAI's website, labs.openai.com

DALL-E delivers great image quality, but without being able to use it programmatically, it's cumbersome to incorporate in to automated workflows.
3. In August, @StabilityAI's open release of #stablediffusion led to an explosion in 3rd party tools built on text-to-image AI.

While @OpenAI hasn't released their network, they have introduced a way to incorporate DALL-E in to workflows and plugins.

Read 11 tweets
Nov 2
🧵Chained Prompts + Web Lookup

1. If you've used GPT-3, you likely asked it to perform a task using a single "prompt" or instruction.

GPT-3's ability to perform tasks in one shot is impressive, but tools like @dust4ai will make it far more powerful.

2. When you prompt GPT-3 to perform a task (e.g. "write a series of tweets about elephants"), it may be more reliable to perform the task in multiple steps:

1: Write a short paragraph about elephants
2: Divide the following paragraph into a series of ~250 character tweets
3. When you do this, the output from the first call to GPT-3 is inserted in to the prompt for the second call, hence the term "chained prompts".

It makes sense that multiple calls to GPT-3 can outperform a single call, but that's only half the story.

arxiv.org/abs/2110.01691
Read 11 tweets
Oct 19
1. Text-based image editing is coming in hot!

Between #DALLE, #midjourney and #stablediffusion text-to-image generation is all the rage, but what if your images are off target?

Usually you try new prompts, then use inpainting to make edits... until now.

2. This month multiple papers have come out demonstrating how language can be used to edit an image.

Equally exciting, these papers are being implemented on top of #stablediffusion since it's now openly available (Thanks @EMostaque!)

Let's review some of the ⚡️fast progress...
3. Last week, @FerranDeLaTorre and Chen Henry Wu released both a paper and code for CycleDiffusion.

CycleDiffusion allows existing text-to-image generation networks (e.g. #stablediffusion) to be used as text-guided image editors.


github.com/ChenWu98/cycle…
Read 13 tweets
Oct 17
1. In 2022, text-to-image tech has improved dramatically.

Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated.

Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text.

🧵 ImageImageImageImage
2. To create a 3D character from text, you'll need to combine a collection of buildings blocks.

1) You need to create a rigged 3D mesh.
2) You need to define the appearance of the character.
3) You need to define movements/animations.
4) You'll likely want some kind of voice.
3. There are multiple ways to create rigged meshes using AI.

The most physically accurate solution is to use SMPL fit to image data.

These images can be real photos, or be generated from text (e.g. "a tall, dreamy AI-enthusiast").

Read 12 tweets
Oct 2
1. Hey there young Spielbergs!

Curious about how AI can be used for film making?

It's still early days, but between text-to-image, text-to-audio and AI-driven animation, building blocks are starting to appear.

Today's thread provides an overview.
🧵👇
2. First off, some highlights!

Way back in July @mrjonfinger used @midjourney to produce a coherent short film.

Very solid visuals, but the voices and animation are a bit stilted. We had to rewatch to grok the plot, but it's 100% there once you get it.

3. Another early AI success story is @adampickard's use of DALL-E's to recreate the famous short film "Powers of Ten" by Ray and Charles Eames.

There's no dialog here, but the narrative of the original definitely comes through.

Read 18 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(