Tweet

Vladimir Haltakov

3 Nov, 18 tweets, 8 min read

How to create art with Machine Learning? 🎨

You've probably seen these strangely beautiful AI-generated images on Twitter. Have you wondered how they are created?

In this thread, I'll tell you about a method for generating art with ML known as VQGAN+CLIP.

Let's jump in 👇

@OpenAI

Short History 📜

In January @OpenAI publicly released CLIP, which is a model that allows matching text to images.

Just days after that, some people like @advadnoun, @RiversHaveWings, and @quasimondo started experimenting using CLIP to guide the output of a GAN using text.

👇

https://twitter.com/advadnoun/status/1351038053033406468

OpenAI published an image generation model together with CLIP, called DALL-E, but without the full code and the pre-trained models.

The results from guiding StyleGAN2 or BigGAN with CLIP aren't as accurate as DALL-E, but they are weirdly artistic.

https://twitter.com/advadnoun/status/1351038053033406468

👇

https://twitter.com/RiversHaveWings/status/1386103970934886403?s=20

When CLIP was paired with a recent model released by the University of Heidelberg in December 2020 called VQGAN, the quality of the generated images improved dramatically.

VQGAN+CLIP is one of the most commonly used methods today.

https://twitter.com/RiversHaveWings/status/1386103970934886403?s=20

👇

@sea_snell

If you are interested in more details about the history of these projects, check out this fantastic article by @sea_snell!

ml.berkeley.edu/blog/posts/cli…

We are now going to explore how this method works and how you can use it yourself 👇

@jbusted1

For the experiments below I'm using this Colab notebook by @jbusted1, which itself is based on one of the original notebooks by @RiversHaveWings.

I only did some modifications on my side to speed up making different experiments.

colab.research.google.com/drive/1gFn9u3o…

👇

The text prompt

To start you need to specify how the final image should look like - the text prompt. You can use natural language for this and you can define several prompts as well.

Let's try the following prompt:

"a car driving on a beautiful mountain road"

Not bad!

👇

However, let's say we want to see some sky on the top and we want the car to be nicer. We can add 2 more prompts describing the scene better.

"a car driving on a beautiful mountain road"
"sky on the top mountains on the sides"
"a beautiful sports car"

Much better!

👇

@arankomatsuzaki

You can now apply some tricks to improve the visual style of the image. For example, you can use the so-called "unreal engine trick" found by @arankomatsuzaki. You just add another prompt saying "unreal engine" and you will get a more realistic-looking image!

Nice! 👇

The process of tuning the text description is called prompt engineering. CLIP is a very powerful model, but you need to ask it in the right way to get the best results. Try out different things!

Here is an example of the styles you can achieve with different prompts.

👇

Start image

By default, the generation will start from a random noise image. However, you can provide your own image to start the process. This will make the final image look much more like it, so it is a way to guide the model.

Example with the same text prompt

👇

Now, let's look a bit under the hood. What happens is the following:

1️⃣ VQGAN generates an image
2️⃣ CLIP evaluates how well the image fits the text prompt
3️⃣ Do a backpropagation pass to the VQGAN
4️⃣ Go back to 1️⃣

👇

CLIP

The really cool thing about CLIP is that it can take an image or some text end encode them in an intermediate (latent) space. This space is the same for both, so you can then compare how similar an image is to a sentence. This itself is a 🤯 achievement.

👇

VQGANs

VQGAN is also an interesting method because for the first time it allows the powerful vision transformers to be efficiently used on high-resolution images by decomposing the image into an ordered list of entries from a codebook.

👇

So, we are basically training the VQGAN to produce images that will fit the text prompts. CLIP is used as a powerful judge to guide the GAN in the right direction in order to achieve good results.

The expressive power comes from the vast knowledge CLIP contains and can use.

👇

Unfortunately, this also means that the whole process is rather slow and needs a powerful GPU with lots of RAM. The experiments above were done in a Google Colab Pro notebook with an Nvidia P100 GPU and each image (1000 iterations) takes about 15 minutes to create.

👇

Another interesting feature is that you can take all the intermediate images and create a cool-looking video!

Check out this video of my Halloween pumpkin using the following prompts:

"a scary orange jack-o-lantern"
"red fire background"

👇

@haltakov

So, that's it for now. In the next thread, I'll tell you how I used this method to create an NFT collection and earn more than $3000 in 2 weeks.

So, follow me @haltakov and stay tuned!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @haltakov

Vladimir Haltakov

@haltakov

2 Nov

Creators only get badges 🏅

There is a problem with how value is distributed in online communities today. It seems we take the status quo for granted and don't discuss it much.

The people that create most of the value, get none of the money! Only badges...

Thread 👇

Online communities

I'm talking about platforms like Twitter, Reddit, Stack Overflow etc. They're wonderful places, where you can discuss interesting topics, get help with a problem, or read the latest news.

However, the people that make them truly valuable receive nothing 👇

It usually looks like this:

▪️ Company creates a web 2.0 platform
▪️ Users create content and increase the value
▪️ Company aggregates the demand
▪️ Company monetizes with ads and subscriptions
▪️ Company gets lots of money
▪️ Creators get badges, karma and virtual gold

👇

Read 25 tweets

Vladimir Haltakov

@haltakov

13 Oct

Machine Learning Formulas Explained! 👨‍🏫

This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.

It may look super confusing, but I promise you that it is actually quite simple!

Let's go step by step 👇

The Cross-Entropy Loss function is one of the most used losses for classification problems. It tells us how well a machine learning model classifies a dataset compared to the ground truth labels.

The Binary Cross-Entropy Loss is a special case when we have only 2 classes.

👇

The most important part to understand is this one - this is the core of the whole formula!

Here, Y denotes the ground-truth label, while Ŷ is the predicted probability of the classifier.

Let's look at a simple example before we talk about the logarithm... 👇

Read 15 tweets

Vladimir Haltakov

@haltakov

21 Sep

There are two problems with ROC curves

❌ They don't work for imbalanced datasets
❌ They don't work for object detection problems

So what do we do to evaluate our machine learning models properly in these cases?

We use a Precision-Recall curve.

Another one of my threads 👇

https://twitter.com/haltakov/status/1438206936680386560

Last week I wrote another detailed thread on ROC curves. I recommend that you read it first if you don't know what they are.

https://twitter.com/haltakov/status/1438206936680386560

Then go on 👇

https://twitter.com/haltakov/status/1435296511772999684

❌ Problem 1 - Imbalanced Data

ROC curves measure the True Positive Rate (also known as Accuracy). So, if you have an imbalanced dataset, the ROC curve will not tell you if your classifier completely ignores the underrepresented class.

More details:

https://twitter.com/haltakov/status/1435296511772999684

👇

Read 19 tweets

Vladimir Haltakov

@haltakov

20 Sep

How to spot fake images of faces generated by a GAN? Look at the eyes! 👁️

This is an interesting paper that shows how fake images of faces can be easily detected by looking at the shape of the pupil.

The pupils in GAN-generated images are usually not round - see the image!

👇

Here is the actual paper. The authors propose a way to automatically identify fake images by analyzing the pupil's shape.

arxiv.org/abs/2109.00162

The bad thing is, GANs will probably quickly catch up and include an additional constraint for pupils to be round...

Read 5 tweets

Vladimir Haltakov

@haltakov

15 Sep

Did you ever want to learn how to read ROC curves? 📈🤔

This is something you will encounter a lot when analyzing the performance of machine learning models.

Let me help you understand them 👇

What does ROC mean?

ROC stands for Receiver Operating Characteristic but just forget about it. This is a military term from the 1940s and doesn't make much sense today.

Think about these curves as True Positive Rate vs. False Positive Rate plots.

Now, let's dive in 👇

The ROC curve visualizes the trade-offs that a binary classifier makes between True Positives and False Positives.

This may sound too abstract for you so let's look at an example. After that, I encourage you to come back and read the previous sentence again!

Now the example 👇

Read 21 tweets

Vladimir Haltakov

@haltakov

14 Sep

Most people seem to use matplotlib as a Python plotting library, but is it really the best choice? 🤔

We are going to compare 5 free and popular libraries:
▪️ Matplotlib
▪️ Seaborn
▪️ Plotly
▪️ Bokeh
▪️ Altair

Which one is the best? Find out below 👇

https://twitter.com/haltakov/status/1436780582361513987

In a survey I did the other day, matplotlib had the most users by a large margin. This was quite surprising to me since I don't really like it...

https://twitter.com/haltakov/status/1436780582361513987

But let's first look at each library 👇

Matplotlib 📈

Matplotlib is one of the most popular libraries out there.

✅ Supports many types of plots
✅ Lots of customization options

❌ Plots look ugly
❌ Limited interactivity
❌ Not very intuitive to use

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Vladimir Haltakov

Try unrolling a thread yourself!

More from @haltakov

Vladimir Haltakov

Vladimir Haltakov

Vladimir Haltakov

Vladimir Haltakov

Vladimir Haltakov

Vladimir Haltakov

Did Thread Reader help you today?

Like this author's thread?