Artsiom Sanakoyeu Profile picture
Research Scientist PhD in Computer Vision @ Heidelberg University, @kaggle Master (Top50)
Apr 8, 2021 9 tweets 3 min read
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement🔥

This paper proposed an improved way to project real images in the StyleGAN latent space (which is required for further image manipulations).

🌀 yuval-alaluf.github.io/restyle-encode…

Thread 👇 1/ Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate. The initial estimate is set to just average latent code across the dataset. ...
Apr 6, 2021 4 tweets 2 min read
Spectacular Image Stylization using CLIP and DALL-E

As a Style Transfer Dude, I can say that this is super cool. A statue of David by Michelangelo was used as an input image. Then it was morphed towards different styles of famous artists by steering the latent code towards...
👇 1/..towards the embeddings of a textual description in CLIP space

I especially like Picasso's Cubism where it created a half-bull half-human portrait which is one of the typical sujets of Picasso. Rene Magritte stylization is my second favorite.

🤙Colab colab.research.google.com/drive/1oA1fZP7…
Apr 4, 2021 9 tweets 3 min read
Self-supervised Learning for Medical images

Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of images for training

arxiv.org/abs/2102.10680 1/
The basic idea is to fix K random locations in the unlabeled medical images (K locations are the same for every image) and crop image patches across different images (which correspond to scans of different patients).
...
Apr 1, 2021 9 tweets 4 min read
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 🔥

Use CLIP model in order to navigate image editing in StyleGAN by text queries.

📝Paper arxiv.org/abs/2103.17249
⚙️ code github.com/orpatashnik/St…

Thread 👇 ImageImage 1/
🛠️How?
1. Take pretrained CLIP, pretrained StyleGAN, and pretrained ArcFace network for face recognition.
2. Project an input image in StyleGAN latent vector w_s.
... Image
Mar 29, 2021 10 tweets 3 min read
Swin Transformer: New SOTA backbone for Computer Vision🔥

👉 What?
New vision Transformer architecture called Swin Transformer that can serve as a backbone in computer vision instead of CNNs.

📝 arxiv.org/abs/2103.14030
⚒ Code (soon) github.com/microsoft/Swin…

Thread 👇 2/
❓Why?
There are two main problems with the usage of Transformers for computer vision.
1. Existing Transformer-based models have tokens of a fixed scale. However, in contrast to the word tokens, visual elements can be different in scale (e.g. objects of varying sizes in img)
Mar 23, 2021 16 tweets 6 min read
🔥New DALL-E? Paint by Word 🔥

Edit a generated image by painting a mask atany location of the image and specifying any text description. Or generate a full image just based on textual input.

📝arxiv.org/abs/2103.10951
1/ 2/ Point to a location in a synthesized image and apply an arbitrary new concept such as “rustic” or “opulent” or “happy dog.”
Mar 23, 2021 5 tweets 3 min read
Meta-DETR: Few-Shot Object Detection via Unified Image-Level Meta-Learning

❓How?
Eliminate region-wise prediction and instead meta-learn object localization and classification at image level in a unified and complementary manner.

🛠️arxiv.org/abs/2103.11731

1/K ...👇 Specifically, the Meta-DETR first encodes both support and query images into category-specific
features and then feeds them into a category-agnostic decoder to directly generate predictions for specific categories. ...
2/K
Mar 23, 2021 16 tweets 4 min read
Open source 2.7 billion parameter GPT-3 model was released

github.com/EleutherAI/gpt…

As you probably know OpenAI has not released source code or pre-trained weights for their 175 billion language model GPT-3.

A thread 👇 1/ Instead, OpenAI decided to create a commercial product and exclusively license GPT-3 to Microsoft.

But open-source enthusiasts from eleuther.ai have open-sourced the weights of 1.3B and 2.7B param models of their replication of GPT-3

🛠️github.com/EleutherAI/gpt…
Mar 21, 2021 11 tweets 3 min read
⚔️ FastNeRF vs NeX ⚔️

Smart ideas do not come in the only head. FastNeRF has the same idea as in NeX, but a bit different implementation. Which one is Faster?

Nex nex-mpi.github.io
FastNeRF arxiv.org/abs/2103.10380

To learn about differences between the two -> thread 👇 1/ The main idea is to factorize the voxel color representation into two independent components: one that depends only on positions p=(x,y,z) of the voxel and one that depends only on the ray directions v.
Essentially you predict K different (R,G,B) values for ever voxel...
Mar 19, 2021 10 tweets 4 min read
How to easily edit and compose images like in Photoshop using GANs🔥

❓What?
Given an incomplete image or a collage of images, generate a realistic image

📌How?
1.Train a regressor to predict StyleGAN latent code even from incomplete image
2.Embedd collage and send it to GAN Image Using latent space regression to analyze and leverage compositionality in GANs

🔶Method
Given a fixed pretrained generator (e.g., StyleGAN), they train...

📝arxiv.org/abs/2103.10426
🧿Project page chail.github.io/latent-composi…
🛠️chail.github.io/latent-composi…
📔colab: colab.research.google.com/drive/1p-L2dPM…