aifunhouse Profile picture
Nov 6 11 tweets 10 min read
🧵Automated image generation with DALL-E

1. A few days ago @OpenAI announced that #DALLE users could create images programmatically via API.

To celebrate we took a vintage shot of @eerac, circa 2012, and through in a healthy assortment of fully automated AI whimsey. ImageImageImageImage
2. Until two days ago, DALL-E users have only ever been able to generate images through @OpenAI's website, labs.openai.com

DALL-E delivers great image quality, but without being able to use it programmatically, it's cumbersome to incorporate in to automated workflows.
3. In August, @StabilityAI's open release of #stablediffusion led to an explosion in 3rd party tools built on text-to-image AI.

While @OpenAI hasn't released their network, they have introduced a way to incorporate DALL-E in to workflows and plugins.

4. We love #stablediffusion, but there's no denying the appeal of #DALLE's ease-of-use and improved visual fidelity on many prompts.

Right out of the box, DALL-E produced "A cartoon cat on roller skates from a 1987 Saturday morning cartoon" purr-fect for TV.

SD's cat? Misssss ImageImage
5. To take full advantage DALL-E's new API, we combined GPT-3 generated image prompts with face detection + outpainting.

To do this, we set up a REPL on @Replit to take load a photo, crop out the face, and then call DALL-E using one or autogenerated prompts. ImageImageImageImage
6. If you'd like to try the REPL yourself, have at it!

To use it, go to @Replit👇, clone the REPL, enter your @OpenAI API key in to the REPL's secrets tab, then click "run".

(Each 1024 x 1024 image costs ~$0.02, so releasing free demos gets pricy fast).

replit.com/@eerac/FaceOut…
7. If you're fed up with our airy hijinks and on the lookout for practical tools, check out @NicolayMausz's new #DALLE photoshop plugin (which also works with #stablediffusion).

We expect to see more integrations like this in the near future.

8. Also, if you're thinking of making your own tools, consider following @yoheinakajima for inspiration.

9. We're pumped @OpenAI continues to extend #DALLE.

There are a few features we'd love to see added to the API to keep up with #stablediffusion.

First, it'd be great if image masks could have partial opacity, as with Dreamstudio's inpainting interface.

10. Second, under the hood DALL-E is capable of impressive text-guided interpolation, but this feature isn't exposed.

Why not make it available through the API?

11. And so concludes our thread on the new DALL-E API. Stay tuned for future updates and demos.

Since you've made it this far, why not follow @aifunhouse for more tutorials, explorations, and #AI fun.

If you enjoyed this thread, please like and share.👇

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with aifunhouse

aifunhouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @aifunhouse

Oct 19
1. Text-based image editing is coming in hot!

Between #DALLE, #midjourney and #stablediffusion text-to-image generation is all the rage, but what if your images are off target?

Usually you try new prompts, then use inpainting to make edits... until now.

2. This month multiple papers have come out demonstrating how language can be used to edit an image.

Equally exciting, these papers are being implemented on top of #stablediffusion since it's now openly available (Thanks @EMostaque!)

Let's review some of the ⚡️fast progress...
3. Last week, @FerranDeLaTorre and Chen Henry Wu released both a paper and code for CycleDiffusion.

CycleDiffusion allows existing text-to-image generation networks (e.g. #stablediffusion) to be used as text-guided image editors.


github.com/ChenWu98/cycle…
Read 13 tweets
Oct 17
1. In 2022, text-to-image tech has improved dramatically.

Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated.

Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text.

🧵 ImageImageImageImage
2. To create a 3D character from text, you'll need to combine a collection of buildings blocks.

1) You need to create a rigged 3D mesh.
2) You need to define the appearance of the character.
3) You need to define movements/animations.
4) You'll likely want some kind of voice.
3. There are multiple ways to create rigged meshes using AI.

The most physically accurate solution is to use SMPL fit to image data.

These images can be real photos, or be generated from text (e.g. "a tall, dreamy AI-enthusiast").

Read 12 tweets
Oct 2
1. Hey there young Spielbergs!

Curious about how AI can be used for film making?

It's still early days, but between text-to-image, text-to-audio and AI-driven animation, building blocks are starting to appear.

Today's thread provides an overview.
🧵👇
2. First off, some highlights!

Way back in July @mrjonfinger used @midjourney to produce a coherent short film.

Very solid visuals, but the voices and animation are a bit stilted. We had to rewatch to grok the plot, but it's 100% there once you get it.

3. Another early AI success story is @adampickard's use of DALL-E's to recreate the famous short film "Powers of Ten" by Ray and Charles Eames.

There's no dialog here, but the narrative of the original definitely comes through.

Read 18 tweets
Oct 1
1. Last week @openai finally removed the waitlist for DALL-E!

Of the AI image creation tools out there, DALL-E is arguably the most accessible.

Today's thread provides a hands-on tutorial for new DALL-E users looking to generate visually consistent assets.

Sticker Time!
2. If you've never used DALL-E, head on over and sign up now!

First announced in April, DALL-E is the OG text-to-image ML model. For months, access was extremely limited, but now everyone, including you, can log in and take this dream machine for a spin.

labs.openai.com
3. If you need inspiration, OpenAI's Instagram is filled with images that a mere 6 month ago would have knocked our collective socks off.

Since then of course, we've all become jaded as new mind-bending tech comes out weekly.

Ah to be young again.

instagram.com/openaidalle/?h…
Read 15 tweets
Sep 30
It's Friday and that means it's time for the @aifunhouse Week in Review!

As always, it's been a wild week in #AI!

DreamBooth, Instant NeRF, Make a Video, and more ... let's get in!

🤖🧵👇🏽
1. First up, DreamBooth, a technique from Google Research originally applied to their tool Imagen, but generalizable to other models, allows for fine-tuning of text-to-image networks to allow generation of consistent characters across contexts and styles. dreambooth.github.io Image
2. The results are wild - take a look:
Read 19 tweets
Sep 27
1. Last week @OpenAI released Whisper, an open source model for transcribing audio.

Let’s see how you can use Whisper + GPT-3 to quickly summarize text-heavy YouTube vids.

If you're new to ML this is a great tutorial to get hands on and play along.

openai.com/blog/whisper/

🧵
2. First we'll to use Whisper to transcribe a video we want to summarize.

Whisper is open source and there are already multiple UIs on the web that allow you run it on video and audio.

For YouTube videos, @jeffistyping created a super simple UI.

huggingface.co/spaces/jeffist…
3. To transcribe a video, just paste in its YouTube URL and wait for the title and preview image to appear.

In this example, we chose a 10-minute video of @garrytan discussing Adobe's recent $20B acquisition of Figma. Image
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(