@jayleicn

@jayleicn

Most recents (2)

Jaemin Cho

@jmin__cho

Presenting our new V+L pretraining work: “Unifying Vision-and-Language Tasks via Text Generation”,
a single unified generative framework (VL-T5 / VL-BART) for diverse multimodal tasks!

Arxiv: arxiv.org/abs/2102.02779

Work done w/ @jayleicn @HaoTan5 @mohitban47 (@uncnlp)

🧵1/n

Existing methods for V+L learning typically require designing task-speciﬁc architectures and objectives for each task.
For example, a multi-label answer classiﬁer for VQA, a region scorer for referring expression comprehension, and a language decoder for image captioning, etc.

To alleviate these hassles, we propose a uniﬁed framework that learns different tasks in a single architecture with the same language modeling objective, i.e., multimodal conditional text generation, where our models learn to generate labels in text based on the V+L inputs.

Read 6 tweets

Mohit Bansal (@🏡)

@mohitban47

@_KarenHao

Thanks @_KarenHao for this fun article in MIT @TechReview (with cats😺) covering @HaoTan5's "Vokenization" work at @UNC, upcoming at #emnlp2020!

(also features kind words from the awesome @Thom_Wolf/@huggingface🤗)

Paper: arxiv.org/abs/2010.06775
Try it: github.com/airsplay/voken…

https://twitter.com/_KarenHao/status/1324760372910477317

Article link below -->

technologyreview.com/2020/11/06/101…

#NLProc #VisionAndLanguage

https://twitter.com/HaoTan5/status/1316785618278666241

And here is the original summary thread by Hao for more info -->

https://twitter.com/HaoTan5/status/1316785618278666241

Read 3 tweets

Discover and read the best of Twitter Threads about #visionandlanguage

Most recents (2)

Related hashtags

Discover and read the best of Twitter Threads about #visionandlanguage

Most recents (2)

Related hashtags

Did Thread Reader help you today?