Conversational flavor of data: RedCaps data is created with a specific intent of human interaction on social media. Reddit users have an incentive (upvotes) to upload high-quality data — sometimes witty or emotional, unlike HTML alt-text. 2/8
Subreddits: We collect data from 350 manually chosen subreddits. Largest subreddits show that Reddit users like to share pets, hobbies, photography! These subreddits let us steer data distribution, and provide image labels even when captions don’t mention objects in image. 3/8
Dataset size: One of the largest public image-text datasets in 2021! RedCaps contains data from the past 13 years (2008–2020). However, it is not static by design! It will continue to grow in the future as more data gets uploaded to Reddit. 4/8
Vision pre-training: In controlled settings, models trained on RedCaps outperform those trained on existing public image-text datasets (SBU, CC-3M) on 10/11 downstream tasks. See zero-shot classification⬇️, more in the paper. 5/8
Image captioning: We showed captions predicted by models trained on RedCaps vs CC-3M (alt-text dataset) to human evaluators. They preferred captions from RedCaps-trained model (underlined ⬇️) for 633/1000 images! 6/8
Subreddit-controlled captioning: Since we trained models with subreddits, we can *prompt* them with different subreddits at test-time. This gives linguistically diverse and amusing captions! Try out our demo: huggingface.co/spaces/umichVi…
Tweet your captions at us with #redcaps! 7/8
Dataset available at redcaps.xyz!
Pre-trained models coming soon.
To add a bit more: I want to give a special shoutout to (1) our anonymous reviewers, who provided constructive feedback and thoroughly engaged with us — see discussion at openreview.net/forum?id=VjJxB… — peer reviewing at its finest! And (2) >> 9/11
(2) Authors of arxiv.org/abs/2006.16923 (@Abebab, @vinayprabhu) — this paper appeared on arxiv ~1 month after I started working on this project. The paper uncovers problematic trends in large image datasets, which I would not have accounted for, had I not read this paper. >> 10/11
(1,2) have affected our design choices for the better. We avoided subreddits having lot of people images, filtered data using NSFW/face detectors, and added a form on redcaps.xyz for anyone to request image removal from RedCaps. Not perfect, but a positive step. 11/11
• • •
Missing some Tweet in this thread? You can try to
force a refresh