I’m happy to share the published version of our ConVIRT algorithm, appearing in #MLHC2022 (PMLR 182). In 2020, this was a pioneering work in contrastive learning of perception by using naturally occurring paired text. Unfortunately, things took a winding path from there. 🧵👇
The paper (Contrastive Learning of Medical Visual Representations from Paired Images and Text, @yuhaozhangx @hjian42 Yasuhide Miura @chrmanning & @curtlanglotz) shows much better unsupervised visual representation learning using paired text versus vision alone (SimCLR, MoCo v2)
However, sometimes you don’t get lucky with conference reviewing—even when at a highly privileged institution. We couldn’t interest reviewers at ICLR2020 or ICCV2021. I think the fact that we showed gains in radiology (x-rays) not general vision seemed to dampen interest….
Luckily, some people read the paper and liked the idea! @AlecRad & colleagues at @OpenAI saw the virtue of the approach and showed the great power of using a simplified version of ConVIRT at a much larger scale on general images leading to CLIP (ICML2021)
openai.com/blog/clip/
And that led to a lot of other vision work exploiting paired text and images to do contrastive learning of visual representations, such as the ALIGN model from Chao Jia et al. at Google (ICML 2021)
Meanwhile, colleagues at Stanford further extended and improved ConVIRT, leading to the approach GLoRIA by Shih-Cheng Huang, @syeung10 et al. at ICCV2021 and CheXzero by Ekin Tiu, @pranavrajpurkar et al. in Nature Biomedical Engineering 2022
Meanwhile, @ESL_Sarah, @gdm3000 & @Prof_Meinel completed the circle by training a version of CLIP on PubMed: arxiv.org/abs/2112.13906.
At any rate, here’s our paper with the original method, slightly updated to reflect all the work that has gone on since:
arxiv.org/abs/2010.00747
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.