Tweet

Tanishq Mathew Abraham

7 Dec, 12 tweets, 4 min read

Have you been seeing artwork like this on your timeline and wondered how it was created?

Let's learn about one of the most popular algorithms for AI-generated art! ⬇ ⬇ ⬇

The technique used is known as "VQGAN+CLIP" and is actually a combination of two deep learning models/algorithms both released earlier this year 2/11

VQGAN - Vector-Quantized Generative Adversarial Network

VQGAN is a type of GAN, which is a class of *generative* neural networks that have been used for #deepfakes and other AI-generated art techniques

You pass a vector/code and VQGAN generates an image. 3/11

VQGAN, like many of the cutting-edge GANs, has a continuous, traversable latent space, which means that codes with similar values will generate similar images, and following a smooth path from one code to another will lead to a smooth interpolation from one image to another 4/11

@OpenAI

CLIP - Contrastive Language-Image Pretraining

CLIP is a model released by @OpenAI (the same company that developed GPT-3!).

It can be used to measure the similarity between an input image and text. 5/11

VQGAN+CLIP:
1. Start w/ an init. image generated by VQGAN w/ a random code & input text provided by user.

2. CLIP provides a similarity measure for the image and text.

3. Through optimization (gradient ascent), iteratively adjust the image to maximize the CLIP similarity. 6/11

That's all there is to it!

Essentially, CLIP guides a search through the latent space of VQGAN to find the vector that map to images which fit with a given sequence of words. 7/11

https://twitter.com/arankomatsuzaki/status/1399471244760649729

VQGAN+CLIP sometimes has unexpected behaviors based on different statistical properties learned during training. This includes the infamous "unreal engine" trick. 8/11

https://twitter.com/arankomatsuzaki/status/1399471244760649729

It's very interesting to see how rewording your prompt or including additional terms can lead to an interesting diversity of results! (known as prompt engineering) 9/11

@WOMBO

While @WOMBO don't specify the algorithm used & might not be exactly the same as existing VQGAN+CLIP tools, the underlying models & principles remain the same but maybe tweaked a bit in terms of the prompt (especially for the style selection) & optimization process. 10/11

@sea_snell

Hope this short thread helps!

I recommend reading @sea_snell's blog post if you want to dive deeper:
ml.berkeley.edu/blog/posts/cli…

If you like this thread, please share!

Consider following me for AI/ML-related content! 🙂

@advadnoun

Oh and adding a note that @advadnoun and @RiversHaveWings were the original pioneers of the VQGAN+CLIP technique (described in more detail in the above blog post)... Check out their work and give them a follow!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @iScienceLuvr

Tanishq Mathew Abraham

@iScienceLuvr

8 Oct

@DeepMind

A paper on "AlphaFold-multimer", a version of AlphaFold that works on protein complexes, was released by @DeepMind.

Accurately predicted structures can lead to better understand the function of such protein complexes that underpin many biological processes!

#DeepLearning 1/4

https://twitter.com/Ag_smith/status/1417063635000598528

Before "AlphaFold-multimer", people discovered that AlphaFold can predict complexes if you connect them with a long linker (this tweet was cited in the above paper!) 2/4

https://twitter.com/Ag_smith/status/1417063635000598528

The new model, which had various adjustments to handle the larger protein complex structures, shows improved performance over this linker approach, along with other approaches 3/4

Read 4 tweets

Tanishq Mathew Abraham

@iScienceLuvr

22 Sep

@fastdotai

In my blog post about GitHub Copilot/Codex (tmabraham.github.io/blog/github_co…), I pointed out lack of knowledge of newer libraries like @fastdotai v2. Testing @OpenAI Codex yesterday, it provided an almost working (regex was off by one character😛) example of fastai v2 code

A few observations:
1. You have to specifically ask for fastai v2 code, but then the import needs to be changed "fastai2.vision.all" →"fastai.vision.all"

2. It has understanding of the differences between the fastai v1 and v2 APIs (correct use of ImageDataLoaders, the fine_tune function new to v2, use of item_tfms to resize before batching)

Read 5 tweets

Tanishq Mathew Abraham

@iScienceLuvr

30 Aug

After you train a machine learning model, the BEST way to showcase it to the world is to make a demo for others to try your model!

Here is a quick thread🧵on two of the easiest ways to make a demo for your machine learning model:

Currently, Gradio is probably the fastest way to set up a machine learning demo ⚡

Just a couple lines of code allows you to use your inference code to make a beautiful demo that you can share with the world.

Learn more here → gradio.app

Using Gradio, I was able to quickly make this demo of my CycleGAN package (screenshot was taken using Gradio's built-in functionality!):

upit-cyclegan.herokuapp.com

Read 10 tweets

Tanishq Mathew Abraham

@iScienceLuvr

20 Aug

The Tesla team discussed how they are using AI to crack Full Self Driving (FSD) at their Tesla AI Day event.

They introduced many cool things:
- HydraNets
- Dojo Processing Units
- Tesla bots
- So much more...

Here's a quick summary 🧵:

They introduced their single deep learning model architecture ("HydraNet") for feature extraction and transforming into a "vector space"

This includes multi-scale features from each of the 8 cameras, integrated with a transformer to attend to important features, incorporating kinematic features, processing in a spatiotemporal manner using a feature queue and spatial RNNs, all trained multi-task learning.

Read 11 tweets

Tanishq Mathew Abraham

@iScienceLuvr

8 Jul

OpenAI has released a 35-page paper on Codex (the model that powers GitHub Copilot)!
arxiv.org/abs/2107.03374

"We fine-tune GPT models containing up to 12B parameters on code to produce Codex."

They note that GitHub Copilot and the upcoming OpenAI API for the model is powered by descendants of the one in this paper.

They introduce a new dataset of Python programming problems in order to evaluate their models:
github.com/openai/human-e…

Read 5 tweets

Tanishq Mathew Abraham

@iScienceLuvr

7 Jul

Yes, this is definitely about television! 🤣🤣🤣

I find it very interesting that Twitter recommends relevant tweets to me, but the topic suggestion is completely off. It looks to me like the recommendation and topic selection algorithm are completely different.

While the tweet recommendation algo is more sophisticated that likely takes into consideration the semantic content of the tweet, the topic selection algo seems to be a simple algorithm that heavily weighs the presence of keywords.

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Tanishq Mathew Abraham

Try unrolling a thread yourself!

More from @iScienceLuvr

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Did Thread Reader help you today?

Like this author's thread?