Have you been seeing artwork like this on your timeline and wondered how it was created?

Let's learn about one of the most popular algorithms for AI-generated art! ⬇ ⬇ ⬇
The technique used is known as "VQGAN+CLIP" and is actually a combination of two deep learning models/algorithms both released earlier this year 2/11
VQGAN - Vector-Quantized Generative Adversarial Network

VQGAN is a type of GAN, which is a class of *generative* neural networks that have been used for #deepfakes and other AI-generated art techniques

You pass a vector/code and VQGAN generates an image. 3/11
VQGAN, like many of the cutting-edge GANs, has a continuous, traversable latent space, which means that codes with similar values will generate similar images, and following a smooth path from one code to another will lead to a smooth interpolation from one image to another 4/11
CLIP - Contrastive Language-Image Pretraining

CLIP is a model released by @OpenAI (the same company that developed GPT-3!).

It can be used to measure the similarity between an input image and text. 5/11
VQGAN+CLIP:
1. Start w/ an init. image generated by VQGAN w/ a random code & input text provided by user.

2. CLIP provides a similarity measure for the image and text.

3. Through optimization (gradient ascent), iteratively adjust the image to maximize the CLIP similarity. 6/11
That's all there is to it!

Essentially, CLIP guides a search through the latent space of VQGAN to find the vector that map to images which fit with a given sequence of words. 7/11
VQGAN+CLIP sometimes has unexpected behaviors based on different statistical properties learned during training. This includes the infamous "unreal engine" trick. 8/11
It's very interesting to see how rewording your prompt or including additional terms can lead to an interesting diversity of results! (known as prompt engineering) 9/11
While @WOMBO don't specify the algorithm used & might not be exactly the same as existing VQGAN+CLIP tools, the underlying models & principles remain the same but maybe tweaked a bit in terms of the prompt (especially for the style selection) & optimization process. 10/11
Hope this short thread helps!

I recommend reading @sea_snell's blog post if you want to dive deeper:
ml.berkeley.edu/blog/posts/cli…

If you like this thread, please share!

Consider following me for AI/ML-related content! 🙂
Oh and adding a note that @advadnoun and @RiversHaveWings were the original pioneers of the VQGAN+CLIP technique (described in more detail in the above blog post)... Check out their work and give them a follow!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tanishq Mathew Abraham

Tanishq Mathew Abraham Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @iScienceLuvr

8 Oct
A paper on "AlphaFold-multimer", a version of AlphaFold that works on protein complexes, was released by @DeepMind.

Accurately predicted structures can lead to better understand the function of such protein complexes that underpin many biological processes!

#DeepLearning 1/4
Before "AlphaFold-multimer", people discovered that AlphaFold can predict complexes if you connect them with a long linker (this tweet was cited in the above paper!) 2/4
The new model, which had various adjustments to handle the larger protein complex structures, shows improved performance over this linker approach, along with other approaches 3/4
Read 4 tweets
22 Sep
In my blog post about GitHub Copilot/Codex (tmabraham.github.io/blog/github_co…), I pointed out lack of knowledge of newer libraries like @fastdotai v2. Testing @OpenAI Codex yesterday, it provided an almost working (regex was off by one character😛) example of fastai v2 code
A few observations:
1. You have to specifically ask for fastai v2 code, but then the import needs to be changed "fastai2.vision.all" →"fastai.vision.all"
2. It has understanding of the differences between the fastai v1 and v2 APIs (correct use of ImageDataLoaders, the fine_tune function new to v2, use of item_tfms to resize before batching)
Read 5 tweets
30 Aug
After you train a machine learning model, the BEST way to showcase it to the world is to make a demo for others to try your model!

Here is a quick thread🧵on two of the easiest ways to make a demo for your machine learning model:
Currently, Gradio is probably the fastest way to set up a machine learning demo ⚡

Just a couple lines of code allows you to use your inference code to make a beautiful demo that you can share with the world.

Learn more here → gradio.app
Using Gradio, I was able to quickly make this demo of my CycleGAN package (screenshot was taken using Gradio's built-in functionality!):

upit-cyclegan.herokuapp.com
Read 10 tweets
20 Aug
The Tesla team discussed how they are using AI to crack Full Self Driving (FSD) at their Tesla AI Day event.

They introduced many cool things:
- HydraNets
- Dojo Processing Units
- Tesla bots
- So much more...

Here's a quick summary 🧵:
They introduced their single deep learning model architecture ("HydraNet") for feature extraction and transforming into a "vector space"
This includes multi-scale features from each of the 8 cameras, integrated with a transformer to attend to important features, incorporating kinematic features, processing in a spatiotemporal manner using a feature queue and spatial RNNs, all trained multi-task learning.
Read 11 tweets
8 Jul
OpenAI has released a 35-page paper on Codex (the model that powers GitHub Copilot)!
arxiv.org/abs/2107.03374
"We fine-tune GPT models containing up to 12B parameters on code to produce Codex."

They note that GitHub Copilot and the upcoming OpenAI API for the model is powered by descendants of the one in this paper.
They introduce a new dataset of Python programming problems in order to evaluate their models:
github.com/openai/human-e…
Read 5 tweets
7 Jul
Yes, this is definitely about television! 🤣🤣🤣
I find it very interesting that Twitter recommends relevant tweets to me, but the topic suggestion is completely off. It looks to me like the recommendation and topic selection algorithm are completely different.
While the tweet recommendation algo is more sophisticated that likely takes into consideration the semantic content of the tweet, the topic selection algo seems to be a simple algorithm that heavily weighs the presence of keywords.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(