📢 New paper alert! How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by #prompting in NLP, our new paper investigates Visual Prompting. (1/5)
Given input-output image example(s) of a new task a new input image, the goal is to produce the output image, consistent with the given examples. Posing this problem as simple image inpainting, literally just filling in a hole in a concatenated grid-like visual prompt image (2/5)
The secret ingredient to get this to work is the training data. To obtain image data that better resembles our visual prompts, we curated 88k unlabeled figures from paper sources on Arxiv. (3/5)
We then trained an MAE to predict the VQGAN tokens of randomly masked image patches. (4/5)
Paper link: arxiv.org/abs/2209.00647, We will release the dataset & code soon in the project page: yossigandelsman.github.io/visual_prompt/
Joint work with collaborators @YGandelsman , @trevordarrell @amirgloberson and Alyosha Efros from @berkeley_ai & @TelAvivUni. (5/5)
We released the dataset here -
github.com/amirbar/visual…
Code/models/etc coming soon.
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
