Tweet

Sergios Karagiannakos

14 Jan, 5 tweets, 2 min read

Visual-Language models and Multimodal learning is growing rapidly over the past 2 -3 years. We've seen some very exciting architectures such as CLIP, ALIGN, DALLE, SimVLM.

I'm currently writing a survey on the topic, so I thought to share some very good resources I found:

🧵⬇️

@pliang279

1) Reading List for Topics in Multimodal Machine Learning by @pliang279

github.com/pliang279/awes…

2) Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

arxiv.org/pdf/2010.09522…

3) From VQA to VLN: Recent Advances in Vision-and-Language Research workshop on CVPR2021

vqa2vln-tutorial.github.io

@AICoffeeBreak

4) Multimodal playlist by @AICoffeeBreak

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Share this page!

Sergios Karagiannakos

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?