How to create art with Machine Learning? 🎨

You've probably seen these strangely beautiful AI-generated images on Twitter. Have you wondered how they are created?

In this thread, I'll tell you about a method for generating art with ML known as VQGAN+CLIP.

Let's jump in πŸ‘‡
Short History πŸ“œ

In January @OpenAI publicly released CLIP, which is a model that allows matching text to images.

Just days after that, some people like @advadnoun, @RiversHaveWings, and @quasimondo started experimenting using CLIP to guide the output of a GAN using text.

πŸ‘‡
OpenAI published an image generation model together with CLIP, called DALL-E, but without the full code and the pre-trained models.

The results from guiding StyleGAN2 or BigGAN with CLIP aren't as accurate as DALL-E, but they are weirdly artistic.



πŸ‘‡
When CLIP was paired with a recent model released by the University of Heidelberg in December 2020 called VQGAN, the quality of the generated images improved dramatically.

VQGAN+CLIP is one of the most commonly used methods today.



πŸ‘‡
If you are interested in more details about the history of these projects, check out this fantastic article by @sea_snell!

ml.berkeley.edu/blog/posts/cli…

We are now going to explore how this method works and how you can use it yourself πŸ‘‡
For the experiments below I'm using this Colab notebook by @jbusted1, which itself is based on one of the original notebooks by @RiversHaveWings.

I only did some modifications on my side to speed up making different experiments.

colab.research.google.com/drive/1gFn9u3o…

πŸ‘‡
The text prompt

To start you need to specify how the final image should look like - the text prompt. You can use natural language for this and you can define several prompts as well.

Let's try the following prompt:

"a car driving on a beautiful mountain road"

Not bad!

πŸ‘‡
However, let's say we want to see some sky on the top and we want the car to be nicer. We can add 2 more prompts describing the scene better.

"a car driving on a beautiful mountain road"
"sky on the top mountains on the sides"
"a beautiful sports car"

Much better!

πŸ‘‡
You can now apply some tricks to improve the visual style of the image. For example, you can use the so-called "unreal engine trick" found by @arankomatsuzaki. You just add another prompt saying "unreal engine" and you will get a more realistic-looking image!

Nice! πŸ‘‡
The process of tuning the text description is called prompt engineering. CLIP is a very powerful model, but you need to ask it in the right way to get the best results. Try out different things!

Here is an example of the styles you can achieve with different prompts.

πŸ‘‡
Start image

By default, the generation will start from a random noise image. However, you can provide your own image to start the process. This will make the final image look much more like it, so it is a way to guide the model.

Example with the same text prompt

πŸ‘‡
Now, let's look a bit under the hood. What happens is the following:

1️⃣ VQGAN generates an image
2️⃣ CLIP evaluates how well the image fits the text prompt
3️⃣ Do a backpropagation pass to the VQGAN
4️⃣ Go back to 1️⃣

πŸ‘‡
CLIP

The really cool thing about CLIP is that it can take an image or some text end encode them in an intermediate (latent) space. This space is the same for both, so you can then compare how similar an image is to a sentence. This itself is a 🀯 achievement.

πŸ‘‡
VQGANs

VQGAN is also an interesting method because for the first time it allows the powerful vision transformers to be efficiently used on high-resolution images by decomposing the image into an ordered list of entries from a codebook.

πŸ‘‡
So, we are basically training the VQGAN to produce images that will fit the text prompts. CLIP is used as a powerful judge to guide the GAN in the right direction in order to achieve good results.

The expressive power comes from the vast knowledge CLIP contains and can use.

πŸ‘‡
Unfortunately, this also means that the whole process is rather slow and needs a powerful GPU with lots of RAM. The experiments above were done in a Google Colab Pro notebook with an Nvidia P100 GPU and each image (1000 iterations) takes about 15 minutes to create.

πŸ‘‡
Another interesting feature is that you can take all the intermediate images and create a cool-looking video!

Check out this video of my Halloween pumpkin using the following prompts:

"a scary orange jack-o-lantern"
"red fire background"

πŸ‘‡
So, that's it for now. In the next thread, I'll tell you how I used this method to create an NFT collection and earn more than $3000 in 2 weeks.

So, follow me @haltakov and stay tuned!

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Vladimir Haltakov

Vladimir Haltakov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @haltakov

2 Nov
Creators only get badges πŸ…

There is a problem with how value is distributed in online communities today. It seems we take the status quo for granted and don't discuss it much.

The people that create most of the value, get none of the money! Only badges...

Thread πŸ‘‡
Online communities

I'm talking about platforms like Twitter, Reddit, Stack Overflow etc. They're wonderful places, where you can discuss interesting topics, get help with a problem, or read the latest news.

However, the people that make them truly valuable receive nothing πŸ‘‡
It usually looks like this:

β–ͺ️ Company creates a web 2.0 platform
β–ͺ️ Users create content and increase the value
β–ͺ️ Company aggregates the demand
β–ͺ️ Company monetizes with ads and subscriptions
β–ͺ️ Company gets lots of money
β–ͺ️ Creators get badges, karma and virtual gold

πŸ‘‡ Image
Read 25 tweets
13 Oct
Machine Learning Formulas Explained! πŸ‘¨β€πŸ«

This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.

It may look super confusing, but I promise you that it is actually quite simple!

Let's go step by step πŸ‘‡
The Cross-Entropy Loss function is one of the most used losses for classification problems. It tells us how well a machine learning model classifies a dataset compared to the ground truth labels.

The Binary Cross-Entropy Loss is a special case when we have only 2 classes.

πŸ‘‡
The most important part to understand is this one - this is the core of the whole formula!

Here, Y denotes the ground-truth label, while ΕΆ is the predicted probability of the classifier.

Let's look at a simple example before we talk about the logarithm... πŸ‘‡
Read 15 tweets
21 Sep
There are two problems with ROC curves

❌ They don't work for imbalanced datasets
❌ They don't work for object detection problems

So what do we do to evaluate our machine learning models properly in these cases?

We use a Precision-Recall curve.

Another one of my threads πŸ‘‡
Last week I wrote another detailed thread on ROC curves. I recommend that you read it first if you don't know what they are.



Then go on πŸ‘‡
❌ Problem 1 - Imbalanced Data

ROC curves measure the True Positive Rate (also known as Accuracy). So, if you have an imbalanced dataset, the ROC curve will not tell you if your classifier completely ignores the underrepresented class.

More details:

πŸ‘‡
Read 19 tweets
20 Sep
How to spot fake images of faces generated by a GAN? Look at the eyes! πŸ‘οΈ

This is an interesting paper that shows how fake images of faces can be easily detected by looking at the shape of the pupil.

The pupils in GAN-generated images are usually not round - see the image!

πŸ‘‡
Here is the actual paper. The authors propose a way to automatically identify fake images by analyzing the pupil's shape.

arxiv.org/abs/2109.00162
The bad thing is, GANs will probably quickly catch up and include an additional constraint for pupils to be round...
Read 5 tweets
15 Sep
Did you ever want to learn how to read ROC curves? πŸ“ˆπŸ€”

This is something you will encounter a lot when analyzing the performance of machine learning models.

Let me help you understand them πŸ‘‡
What does ROC mean?

ROC stands for Receiver Operating Characteristic but just forget about it. This is a military term from the 1940s and doesn't make much sense today.

Think about these curves as True Positive Rate vs. False Positive Rate plots.

Now, let's dive in πŸ‘‡
The ROC curve visualizes the trade-offs that a binary classifier makes between True Positives and False Positives.

This may sound too abstract for you so let's look at an example. After that, I encourage you to come back and read the previous sentence again!

Now the example πŸ‘‡
Read 21 tweets
14 Sep
Most people seem to use matplotlib as a Python plotting library, but is it really the best choice? πŸ€”

We are going to compare 5 free and popular libraries:
β–ͺ️ Matplotlib
β–ͺ️ Seaborn
β–ͺ️ Plotly
β–ͺ️ Bokeh
β–ͺ️ Altair

Which one is the best? Find out below πŸ‘‡
In a survey I did the other day, matplotlib had the most users by a large margin. This was quite surprising to me since I don't really like it...



But let's first look at each library πŸ‘‡
Matplotlib πŸ“ˆ

Matplotlib is one of the most popular libraries out there.

βœ… Supports many types of plots
βœ… Lots of customization options

❌ Plots look ugly
❌ Limited interactivity
❌ Not very intuitive to use
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(