📢New dataset!📢 RedCaps: 12M image-text pairs from Reddit for vision and vision-and-language applications.
Website: redcaps.xyz
Paper: arxiv.org/abs/2111.11431

Check out captions from a RedCaps-trained model!⬇️
Try more here: huggingface.co/spaces/umichVi…
What's new?🧵1/8
Conversational flavor of data: RedCaps data is created with a specific intent of human interaction on social media. Reddit users have an incentive (upvotes) to upload high-quality data — sometimes witty or emotional, unlike HTML alt-text. 2/8
Subreddits: We collect data from 350 manually chosen subreddits. Largest subreddits show that Reddit users like to share pets, hobbies, photography! These subreddits let us steer data distribution, and provide image labels even when captions don’t mention objects in image. 3/8
Dataset size: One of the largest public image-text datasets in 2021! RedCaps contains data from the past 13 years (2008–2020). However, it is not static by design! It will continue to grow in the future as more data gets uploaded to Reddit. 4/8
Vision pre-training: In controlled settings, models trained on RedCaps outperform those trained on existing public image-text datasets (SBU, CC-3M) on 10/11 downstream tasks. See zero-shot classification⬇️, more in the paper. 5/8
Image captioning: We showed captions predicted by models trained on RedCaps vs CC-3M (alt-text dataset) to human evaluators. They preferred captions from RedCaps-trained model (underlined ⬇️) for 633/1000 images! 6/8
Subreddit-controlled captioning: Since we trained models with subreddits, we can *prompt* them with different subreddits at test-time. This gives linguistically diverse and amusing captions! Try out our demo: huggingface.co/spaces/umichVi…
Tweet your captions at us with #redcaps! 7/8
Dataset available at redcaps.xyz!
Pre-trained models coming soon.

(w/ @gauravkaul7, @zubinaysola, @jcjohnss)
To appear at NeurIPS 2021 datasets and benchmarks.
8/8 Fin.
To add a bit more: I want to give a special shoutout to (1) our anonymous reviewers, who provided constructive feedback and thoroughly engaged with us — see discussion at openreview.net/forum?id=VjJxB… — peer reviewing at its finest! And (2) >> 9/11
(2) Authors of arxiv.org/abs/2006.16923 (@Abebab, @vinayprabhu) — this paper appeared on arxiv ~1 month after I started working on this project. The paper uncovers problematic trends in large image datasets, which I would not have accounted for, had I not read this paper. >> 10/11
(1,2) have affected our design choices for the better. We avoided subreddits having lot of people images, filtered data using NSFW/face detectors, and added a form on redcaps.xyz for anyone to request image removal from RedCaps. Not perfect, but a positive step. 11/11

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Karan Desai (KD)

Karan Desai (KD) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Thank you for your support!

Follow Us on Twitter!