Visual-Language models and Multimodal learning is growing rapidly over the past 2 -3 years. We've seen some very exciting architectures such as CLIP, ALIGN, DALLE, SimVLM.
I'm currently writing a survey on the topic, so I thought to share some very good resources I found:
🧵⬇️
1) Reading List for Topics in Multimodal Machine Learning by @pliang279