Tanishq Mathew Abraham, Ph.D. Profile picture
PhD at 19 | Founder and CEO at @MedARC_AI | Research Director at @StabilityAI | @kaggle Notebooks GM | Biomed. engineer @ 14 | TEDx talk➡https://t.co/xPxwKTpz0D

Jun 13, 2022, 16 tweets

You may have seen surreal and absurd AI-generated images like these ones...

These are all generated with an AI tool known as DALL·E mini

Let's talk about the history of #dallemini, and also *how* it works! ↓↓↓🧵

First, let's clarify the different AI tools which many get confused about:

- DALL·E was an @OpenAI-developed AI project from Jan 2021

- DALL·E mini is a community-created project inspired by DALL·E
- DALL·E 2 is another @OpenAI-developed tool released in April (2/16)

@OpenAI DALL·E mini was actually originally developed about a year ago, back in July 2021.

During a programming competition organized by @huggingface (an AI company), @borisdayma & some community folks (including myself!) developed a neural network inspired by DALL·E & studied it (3/16)

@OpenAI @huggingface @borisdayma It was a great experience, we even won that competition!

Boris has now been continuing development on DALL·E mini, developing larger neural networks with even more data!

But how does it work?? (4/16)

@OpenAI @huggingface @borisdayma At the core of DALL·E mini are two components:
- language model
- image decoder

DALL·E mini learns from *millions* of image-text caption pairs sourced from the Internet. (5/16)

@OpenAI @huggingface @borisdayma The first component is a neural language model. You may already be familiar with neural language models, like the famous GPT-3 model, which takes texts and produces more text.

DALL·E mini uses another type of neural language model known as "BART" (6/16)

@OpenAI @huggingface @borisdayma But the BART model takes in text and produces images! How's that possible?

It's worth realizing that the language models don't actually work with text directly but represent the text as a sequence of discrete values that map to text. (this is known as "tokenization") (7/16)

@OpenAI @huggingface @borisdayma In fact, it's worth pointing out that BART is technically considered what is known as a "sequence-to-sequence" neural network for this reason. It can take in any discrete sequence and output a corresponding discrete sequence depending on the task it is trained on. (8/16)

@OpenAI @huggingface @borisdayma So what if we also represent images as a sequence of discrete values? 🤔

While we could consider each pixel as a separate discrete value, this is inefficient & doesn't scale well.

Instead we utilize another neural network to *learn* a mapping from an image to a sequence. (9/16)

@OpenAI @huggingface @borisdayma This neural network is known as VQGAN, which you may recognize from the VQGAN+CLIP technique used by another viral AI art tool (10/16)

@OpenAI @huggingface @borisdayma This VQGAN model learns from millions of images to learn a good mapping. A good mapping is one that can go from the sequence to a full image with minimal error. (11/16)

@OpenAI @huggingface @borisdayma As a separate note, you might have noticed that many of the #dallemini artworks have messed up faces 😄

This is mainly since the VQGAN hasn't learned a good mapping to easily represent faces as a sequence of discrete values. (12/16)

@OpenAI @huggingface @borisdayma So to summarize, we use BART, a sequence-to-sequence neural network to map our text prompt (which is represented as a discrete sequence) to another discrete sequence which is then mapped to an actual image with the VQGAN. (13/16)

@OpenAI @huggingface @borisdayma Millions of images and corresponding captions were available as datasets to use for DALL·E mini learning. Then during learning process, the BART model is given a caption and is adjusted to reduce the difference between generated images and the actual corresponding images. (14/16)

@OpenAI @huggingface @borisdayma It's that simple!

Well that's oversimplification obviously, with many challenges when scaling up these huge models and using millions of images, but the basic concept is simple. (15/16)

@OpenAI @huggingface @borisdayma Hope this thread was educational!

If you like this thread, please share!

Consider following me (@iScienceLuvr) for AI/ML-related content! 🙂

Also consider following the main DALL·E mini developer, @borisdayma! (16/16, end of thread)

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling