VQGAN is a type of GAN, which is a class of *generative* neural networks that have been used for #deepfakes and other AI-generated art techniques
You pass a vector/code and VQGAN generates an image. 3/11
VQGAN, like many of the cutting-edge GANs, has a continuous, traversable latent space, which means that codes with similar values will generate similar images, and following a smooth path from one code to another will lead to a smooth interpolation from one image to another 4/11
CLIP - Contrastive Language-Image Pretraining
CLIP is a model released by @OpenAI (the same company that developed GPT-3!).
It can be used to measure the similarity between an input image and text. 5/11
VQGAN+CLIP: 1. Start w/ an init. image generated by VQGAN w/ a random code & input text provided by user.
2. CLIP provides a similarity measure for the image and text.
3. Through optimization (gradient ascent), iteratively adjust the image to maximize the CLIP similarity. 6/11
That's all there is to it!
Essentially, CLIP guides a search through the latent space of VQGAN to find the vector that map to images which fit with a given sequence of words. 7/11
VQGAN+CLIP sometimes has unexpected behaviors based on different statistical properties learned during training. This includes the infamous "unreal engine" trick. 8/11
It's very interesting to see how rewording your prompt or including additional terms can lead to an interesting diversity of results! (known as prompt engineering) 9/11
While @WOMBO don't specify the algorithm used & might not be exactly the same as existing VQGAN+CLIP tools, the underlying models & principles remain the same but maybe tweaked a bit in terms of the prompt (especially for the style selection) & optimization process. 10/11
Consider following me for AI/ML-related content! 🙂
Oh and adding a note that @advadnoun and @RiversHaveWings were the original pioneers of the VQGAN+CLIP technique (described in more detail in the above blog post)... Check out their work and give them a follow!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Before "AlphaFold-multimer", people discovered that AlphaFold can predict complexes if you connect them with a long linker (this tweet was cited in the above paper!) 2/4
The new model, which had various adjustments to handle the larger protein complex structures, shows improved performance over this linker approach, along with other approaches 3/4
In my blog post about GitHub Copilot/Codex (tmabraham.github.io/blog/github_co…), I pointed out lack of knowledge of newer libraries like @fastdotai v2. Testing @OpenAI Codex yesterday, it provided an almost working (regex was off by one character😛) example of fastai v2 code
A few observations: 1. You have to specifically ask for fastai v2 code, but then the import needs to be changed "fastai2.vision.all" →"fastai.vision.all"
2. It has understanding of the differences between the fastai v1 and v2 APIs (correct use of ImageDataLoaders, the fine_tune function new to v2, use of item_tfms to resize before batching)
The Tesla team discussed how they are using AI to crack Full Self Driving (FSD) at their Tesla AI Day event.
They introduced many cool things:
- HydraNets
- Dojo Processing Units
- Tesla bots
- So much more...
Here's a quick summary 🧵:
They introduced their single deep learning model architecture ("HydraNet") for feature extraction and transforming into a "vector space"
This includes multi-scale features from each of the 8 cameras, integrated with a transformer to attend to important features, incorporating kinematic features, processing in a spatiotemporal manner using a feature queue and spatial RNNs, all trained multi-task learning.
I find it very interesting that Twitter recommends relevant tweets to me, but the topic suggestion is completely off. It looks to me like the recommendation and topic selection algorithm are completely different.
While the tweet recommendation algo is more sophisticated that likely takes into consideration the semantic content of the tweet, the topic selection algo seems to be a simple algorithm that heavily weighs the presence of keywords.