8. As a quick test of @shivamshiraro's notebook, we were able to breath a bit of life in to @eerac's infamous #deadeyes photo from our first tutorial way back in September.
This required a bit of trial and error. We'll follow up with tips once we actually know what we're doing.
9. Finally, if you had concerns that progress was about to stall, yet *another* text-based image paper was released yesterday by @GoogleAI
Naturally, the results look amazing, but at this point, we're too jaded to care.
11. To string image operations together, we're keeping an eye on startups like @krea_ai and @runwayml. Both have hinted at building conversational UIs for editing images and videos.
In 2021 this all seemed impossible, but a year later it's within reach!
The rapid pace of progress is well-aligned with @EMostaque's prediction earlier this week that AI-powered image generation will more-or-less be "solved" in 2023.
1. In 2022, text-to-image tech has improved dramatically.
Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated.
Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text.
🧵
2. To create a 3D character from text, you'll need to combine a collection of buildings blocks.
1) You need to create a rigged 3D mesh. 2) You need to define the appearance of the character. 3) You need to define movements/animations. 4) You'll likely want some kind of voice.
3. There are multiple ways to create rigged meshes using AI.
The most physically accurate solution is to use SMPL fit to image data.
These images can be real photos, or be generated from text (e.g. "a tall, dreamy AI-enthusiast").
1. Last week @openai finally removed the waitlist for DALL-E!
Of the AI image creation tools out there, DALL-E is arguably the most accessible.
Today's thread provides a hands-on tutorial for new DALL-E users looking to generate visually consistent assets.
Sticker Time!
2. If you've never used DALL-E, head on over and sign up now!
First announced in April, DALL-E is the OG text-to-image ML model. For months, access was extremely limited, but now everyone, including you, can log in and take this dream machine for a spin.
DreamBooth, Instant NeRF, Make a Video, and more ... let's get in!
🤖🧵👇🏽
1. First up, DreamBooth, a technique from Google Research originally applied to their tool Imagen, but generalizable to other models, allows for fine-tuning of text-to-image networks to allow generation of consistent characters across contexts and styles. dreambooth.github.io