The self-supervised task here is to heavily mask the input image (up to 75%) and have a ViT based encoder and a light decoder to reconstruct it.
Sep 28, 2021 • 4 tweets • 2 min read
PASS: a new dataset without any humans to replace ImageNet for self-supervised pre-training.
No humans means fewer head aches with ethical and legal matters. In addition PASS images are all under CC-4 license, which is great for both research and commercial purposes 1/
Some interesting findings:
- pre-training on PASS vs pre-training on IN-1k leads to performance in the same ball park for various downstream tasks
- performance on human-centered downstream tasks, eg human dense pose prediction, is on par with IN-1k models 2/
Feb 28, 2020 • 5 tweets • 4 min read
Introducing new #cvpr2020 work with S. Gidaris and team on a new self-supervised task: Learning Representations by Predicting Bags of Visual Words arxiv.org/abs/2002.12247 1/ @quobbe Inspired by NLP approaches, our method builds upon features from a self-supervised CNN (e.g. RotNet), which are used for computing a codebook of visual words and image-level Bag-of-Words (BoW) representations 2/