📢 New paper alert! How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by #prompting in NLP, our new paper investigates Visual Prompting. (1/5)
Given input-output image example(s) of a new task a new input image, the goal is to produce the output image, consistent with the given examples. Posing this problem as simple image inpainting, literally just filling in a hole in a concatenated grid-like visual prompt image (2/5)
The secret ingredient to get this to work is the training data. To obtain image data that better resembles our visual prompts, we curated 88k unlabeled figures from paper sources on Arxiv. (3/5)
We then trained an MAE to predict the VQGAN tokens of randomly masked image patches. (4/5)