aifunhouse Profile picture
Oct 19 13 tweets 9 min read
1. Text-based image editing is coming in hot!

Between #DALLE, #midjourney and #stablediffusion text-to-image generation is all the rage, but what if your images are off target?

Usually you try new prompts, then use inpainting to make edits... until now.

2. This month multiple papers have come out demonstrating how language can be used to edit an image.

Equally exciting, these papers are being implemented on top of #stablediffusion since it's now openly available (Thanks @EMostaque!)

Let's review some of the ⚡️fast progress...
3. Last week, @FerranDeLaTorre and Chen Henry Wu released both a paper and code for CycleDiffusion.

CycleDiffusion allows existing text-to-image generation networks (e.g. #stablediffusion) to be used as text-guided image editors.


github.com/ChenWu98/cycle…
4. 6 days later, Google released Imagic, another text-based editing solution built on top of Imagen, Google's unreleased text-to-image model.

Amazingly, @Buntworthy was able to implement Imagic on top of #stablediffusion 1 day later (i.e. yesterday)!

5. To showcase Imagic in action, @Buntworthy edited a photo of Barack Obama using the phrase "A photo of Barack Obama smiling big grin".👇

Imagic takes ~5 minutes to run on an A100 GPU.

6. Encouragingly, @andrewb10687674 was able to produce a similar edit using CycleDiffusion👇, which ran in ~1 minute.

An baseline img2img result is also included on the right to remind us what we used to be impressive by way back in September.
7. Both the Imagic and CycleDiffusion results look solid, but the Imagic code had the disadvantage of needing >24GB of RAM.

Horrible right? Fortunately, last night @shrivamshrirao got Imagic working well under the critical 16GB threshold. Hooray!

8. As a quick test of @shivamshiraro's notebook, we were able to breath a bit of life in to @eerac's infamous #deadeyes photo from our first tutorial way back in September.

This required a bit of trial and error. We'll follow up with tips once we actually know what we're doing. ImageImage
9. Finally, if you had concerns that progress was about to stall, yet *another* text-based image paper was released yesterday by @GoogleAI

Naturally, the results look amazing, but at this point, we're too jaded to care.

10. So where is this all headed?

First, we're clearly about to see an explosion of image manipulation operations being implemented using language.

In addition to editing photos, for example, it's also possible to select regions of a photo using text.

11. To string image operations together, we're keeping an eye on startups like @krea_ai and @runwayml. Both have hinted at building conversational UIs for editing images and videos.

In 2021 this all seemed impossible, but a year later it's within reach!

12. So that's where this train is going.

The rapid pace of progress is well-aligned with @EMostaque's prediction earlier this week that AI-powered image generation will more-or-less be "solved" in 2023.

13. Like this thread? Looking to spend more time talking to your computer and/or photo collection? We got you!

Follow @aifunhouse for more tutorials, explainers, explorations, and #AI fun.

If you enjoyed this thread, please like and share.👇

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with aifunhouse

aifunhouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @aifunhouse

Oct 17
1. In 2022, text-to-image tech has improved dramatically.

Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated.

Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text.

🧵 ImageImageImageImage
2. To create a 3D character from text, you'll need to combine a collection of buildings blocks.

1) You need to create a rigged 3D mesh.
2) You need to define the appearance of the character.
3) You need to define movements/animations.
4) You'll likely want some kind of voice.
3. There are multiple ways to create rigged meshes using AI.

The most physically accurate solution is to use SMPL fit to image data.

These images can be real photos, or be generated from text (e.g. "a tall, dreamy AI-enthusiast").

Read 12 tweets
Oct 2
1. Hey there young Spielbergs!

Curious about how AI can be used for film making?

It's still early days, but between text-to-image, text-to-audio and AI-driven animation, building blocks are starting to appear.

Today's thread provides an overview.
🧵👇
2. First off, some highlights!

Way back in July @mrjonfinger used @midjourney to produce a coherent short film.

Very solid visuals, but the voices and animation are a bit stilted. We had to rewatch to grok the plot, but it's 100% there once you get it.

3. Another early AI success story is @adampickard's use of DALL-E's to recreate the famous short film "Powers of Ten" by Ray and Charles Eames.

There's no dialog here, but the narrative of the original definitely comes through.

Read 18 tweets
Oct 1
1. Last week @openai finally removed the waitlist for DALL-E!

Of the AI image creation tools out there, DALL-E is arguably the most accessible.

Today's thread provides a hands-on tutorial for new DALL-E users looking to generate visually consistent assets.

Sticker Time!
2. If you've never used DALL-E, head on over and sign up now!

First announced in April, DALL-E is the OG text-to-image ML model. For months, access was extremely limited, but now everyone, including you, can log in and take this dream machine for a spin.

labs.openai.com
3. If you need inspiration, OpenAI's Instagram is filled with images that a mere 6 month ago would have knocked our collective socks off.

Since then of course, we've all become jaded as new mind-bending tech comes out weekly.

Ah to be young again.

instagram.com/openaidalle/?h…
Read 15 tweets
Sep 30
It's Friday and that means it's time for the @aifunhouse Week in Review!

As always, it's been a wild week in #AI!

DreamBooth, Instant NeRF, Make a Video, and more ... let's get in!

🤖🧵👇🏽
1. First up, DreamBooth, a technique from Google Research originally applied to their tool Imagen, but generalizable to other models, allows for fine-tuning of text-to-image networks to allow generation of consistent characters across contexts and styles. dreambooth.github.io Image
2. The results are wild - take a look:
Read 19 tweets
Sep 27
1. Last week @OpenAI released Whisper, an open source model for transcribing audio.

Let’s see how you can use Whisper + GPT-3 to quickly summarize text-heavy YouTube vids.

If you're new to ML this is a great tutorial to get hands on and play along.

openai.com/blog/whisper/

🧵
2. First we'll to use Whisper to transcribe a video we want to summarize.

Whisper is open source and there are already multiple UIs on the web that allow you run it on video and audio.

For YouTube videos, @jeffistyping created a super simple UI.

huggingface.co/spaces/jeffist…
3. To transcribe a video, just paste in its YouTube URL and wait for the title and preview image to appear.

In this example, we chose a 10-minute video of @garrytan discussing Adobe's recent $20B acquisition of Figma. Image
Read 14 tweets
Sep 26
Text-to-image networks generate amazing imagery, but out of the box, it's hard to define characters and styles that remain consistent across outputs.

The crew at @CorridorDigital has a great video showcasing how tools like DreamBooth can solve this.
🧵👇
DALL-E and Stable Diffusion (SD) have taken Twitter by storm because of the high quality visuals they generate from text prompts.

Within a prompt, you can refer to well-known people/objects (e.g. "Eiffel Tower") and place them in specific situations, or stylize them in new ways. ImageImageImageImage
Sadly, if you want to create an image with a not-so-well-known object (e.g. you), you're out of luck.

You can try to describe the object, but this won't cut it for specific people or products. It also won't result in great consistency across images.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(