Tweet

https://twitter.com/metasemantic/status/1368713208429764616

https://twitter.com/artsiom_s/status/1374248622183829504?s=20

https://twitter.com/artsiom_s/status/1374248622183829504?s=20

More from @artsiom_s

Artsiom Sanakoyeu

@artsiom_s

4 Apr

Self-supervised Learning for Medical images

Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of images for training

arxiv.org/abs/2102.10680

1/
The basic idea is to fix K random locations in the unlabeled medical images (K locations are the same for every image) and crop image patches across different images (which correspond to scans of different patients).
...

2/
Now we create a surrogate classification task by assigning a unique pseudo-label to every location 1...K.
Authors combine the surrogate classification task with image restoration using a denoising autoencoder: they randomly perturb the cropped patches (color jittering, ...

Read 9 tweets

Artsiom Sanakoyeu

@artsiom_s

29 Mar

Swin Transformer: New SOTA backbone for Computer Vision🔥

👉 What?
New vision Transformer architecture called Swin Transformer that can serve as a backbone in computer vision instead of CNNs.

📝 arxiv.org/abs/2103.14030
⚒ Code (soon) github.com/microsoft/Swin…

Thread 👇

2/
❓Why?
There are two main problems with the usage of Transformers for computer vision.
1. Existing Transformer-based models have tokens of a fixed scale. However, in contrast to the word tokens, visual elements can be different in scale (e.g. objects of varying sizes in img)

3/
2. Regular self-attention requires quadratic of the image size number of operations, limiting applications in computer vision where high resolution is necessary (e.g., instance segmentation).
...

Read 10 tweets

Artsiom Sanakoyeu

@artsiom_s

23 Mar

🔥New DALL-E? Paint by Word 🔥

Edit a generated image by painting a mask atany location of the image and specifying any text description. Or generate a full image just based on textual input.

📝arxiv.org/abs/2103.10951
1/

2/ Point to a location in a synthesized image and apply an arbitrary new concept such as “rustic” or “opulent” or “happy dog.”

3/
🛠️Two nets:
(1) a semantic similarity network C(x, t) that scores the semantic consistency between an image x and a text description t. It consists of two subnetworks: C_i(x) which embeds images and C_t(t) which embeds text.
(2) generative network G(z) that is trained to ...

Read 16 tweets

Artsiom Sanakoyeu

@artsiom_s

23 Mar

Meta-DETR: Few-Shot Object Detection via Unified Image-Level Meta-Learning

❓How?
Eliminate region-wise prediction and instead meta-learn object localization and classification at image level in a unified and complementary manner.

🛠️arxiv.org/abs/2103.11731

1/K ...👇

Specifically, the Meta-DETR first encodes both support and query images into category-specific
features and then feeds them into a category-agnostic decoder to directly generate predictions for specific categories. ...
2/K

Authors propose a Semantic Alignment Mechanism (SAM), which aligns high-level and low-level feature semantics to improve the generalization of meta-learned representations. ...
3/K

Read 5 tweets

Artsiom Sanakoyeu

@artsiom_s

23 Mar

Open source 2.7 billion parameter GPT-3 model was released

github.com/EleutherAI/gpt…

As you probably know OpenAI has not released source code or pre-trained weights for their 175 billion language model GPT-3.

A thread 👇

1/ Instead, OpenAI decided to create a commercial product and exclusively license GPT-3 to Microsoft.

But open-source enthusiasts from eleuther.ai have open-sourced the weights of 1.3B and 2.7B param models of their replication of GPT-3

🛠️github.com/EleutherAI/gpt…

2/ It is the largest (afaik) publicly available GPT-3 replica. The primary goal of this project is to replicate a full-sized GPT-3 model and open source it to the public, for free.
The models were trained on an open-source dataset The Pile pile.eleuther.ai which ...

Read 16 tweets

Artsiom Sanakoyeu

@artsiom_s

21 Mar

⚔️ FastNeRF vs NeX ⚔️

Smart ideas do not come in the only head. FastNeRF has the same idea as in NeX, but a bit different implementation. Which one is Faster?

Nex nex-mpi.github.io
FastNeRF arxiv.org/abs/2103.10380

To learn about differences between the two -> thread 👇

1/ The main idea is to factorize the voxel color representation into two independent components: one that depends only on positions p=(x,y,z) of the voxel and one that depends only on the ray directions v.
Essentially you predict K different (R,G,B) values for ever voxel...

https://twitter.com/artsiom_s/status/1373464655935471616?s=20

2/ Essentially you predict K different (R,G,B) values for ever voxel and K weighting scalars H_i(v) for each of them:
color(x,y,z) = RGB_1 * H_1 + RGB_2 * H_2 + ... + RGB_K * H_K. This is inspired by the rendering equation.
...

https://twitter.com/artsiom_s/status/1373464655935471616?s=20

Read 11 tweets

Share this page!

Artsiom Sanakoyeu

Try unrolling a thread yourself!

More from @artsiom_s

Artsiom Sanakoyeu

Artsiom Sanakoyeu

Artsiom Sanakoyeu

Artsiom Sanakoyeu

Artsiom Sanakoyeu

Artsiom Sanakoyeu

Did Thread Reader help you today?

Like this author's thread?