Discover and read the best of Twitter Threads about #visionandlanguage

Most recents (2)

Presenting our new V+L pretraining work: “Unifying Vision-and-Language Tasks via Text Generation”,
a single unified generative framework (VL-T5 / VL-BART) for diverse multimodal tasks!

Arxiv: arxiv.org/abs/2102.02779

Work done w/ @jayleicn @HaoTan5 @mohitban47 (@uncnlp)

🧵1/n
Existing methods for V+L learning typically require designing task-specific architectures and objectives for each task.
For example, a multi-label answer classifier for VQA, a region scorer for referring expression comprehension, and a language decoder for image captioning, etc.
To alleviate these hassles, we propose a unified framework that learns different tasks in a single architecture with the same language modeling objective, i.e., multimodal conditional text generation, where our models learn to generate labels in text based on the V+L inputs.
Read 6 tweets
Thanks @_KarenHao for this fun article in MIT @TechReview (with cats😺) covering @HaoTan5's "Vokenization" work at @UNC, upcoming at #emnlp2020!

(also features kind words from the awesome @Thom_Wolf/@huggingface🤗)

Paper: arxiv.org/abs/2010.06775
Try it: github.com/airsplay/voken…
And here is the original summary thread by Hao for more info -->

Read 3 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!