I’m happy to share the published version of our ConVIRT algorithm, appearing in #MLHC2022 (PMLR 182). In 2020, this was a pioneering work in contrastive learning of perception by using naturally occurring paired text. Unfortunately, things took a winding path from there. 🧵👇
The paper (Contrastive Learning of Medical Visual Representations from Paired Images and Text, @yuhaozhangx@hjian42 Yasuhide Miura @chrmanning & @curtlanglotz) shows much better unsupervised visual representation learning using paired text versus vision alone (SimCLR, MoCo v2)
However, sometimes you don’t get lucky with conference reviewing—even when at a highly privileged institution. We couldn’t interest reviewers at ICLR2020 or ICCV2021. I think the fact that we showed gains in radiology (x-rays) not general vision seemed to dampen interest….
Luckily, some people read the paper and liked the idea! @AlecRad & colleagues at @OpenAI saw the virtue of the approach and showed the great power of using a simplified version of ConVIRT at a much larger scale on general images leading to CLIP (ICML2021)
And that led to a lot of other vision work exploiting paired text and images to do contrastive learning of visual representations, such as the ALIGN model from Chao Jia et al. at Google (ICML 2021)
Meanwhile, colleagues at Stanford further extended and improved ConVIRT, leading to the approach GLoRIA by Shih-Cheng Huang, @syeung10 et al. at ICCV2021 and CheXzero by Ekin Tiu, @pranavrajpurkar et al. in Nature Biomedical Engineering 2022
I would suggest that this thread errs by over-representing the proportion of the time in which human “reasoning” is actually anything akin to mathematical reasoning, such as the example of solving SAT instances. 1/
To start with a Go analogy: A moderately skilled player can exhaustively read out a 6–10 move life-and-death problem or end game sequence—or usually work it out more quickly using pattern-based shortcuts! But for longer, more complex things such as fuseki (opening) sequences, 2/
they “reason” about moves—“it would be better for me to approach here than to extend on the other side, because then they would jump and I could then approach strengthening my corner while attacking”—but really this is pattern matching! It’s not like solving a SAT problem. 3/
It’s great getting to read my colleagues @robreich, @mehran_sahami & Jeremy Weinstein’s book, System Error. Building a broad understanding of problems with big tech and techno utopianism is such an important topic for this decade. harpercollins.com/products/syste…
Some thoughts below. 🧵👇
The authors rightly stress many key problems that have emerged: deficiencies of simplistic metrics, dangers of tech monopolies, balancing innovation against status as a public utility, what should become of privacy and free speech in a world of corporate-owned public squares?
However, they end up questioning an “optimization mindset” in general, and I’m not sure that’s right. There are many ways that optimization can go wrong, which they discuss: people can adopt simplistic uni-dimensional metrics (e.g., “connecting more people is good for all”) or …
@yoavgo@ChrisGPotts I take primary blame for advocating the anonymity period. It was an honest attempt at a compromise middle ground. With the passage of time, I admit that it seems a bit flawed, as more people aim for “the anonymity period deadline” but the real question is what would be better?
@yoavgo@ChrisGPotts Your suggestion, @yoavgo, is to move the dial all the way to the left or to the right, but such extremist positions seldom are optimal in a complex and varied world. We could survey again, but I suspect the situation is similar to where it was 3 years ago: one large group, ...
@yoavgo@ChrisGPotts which seemed to center on white, male, Americans were all for preprints (mostly killing anonymous reviewing) while another large group (mainly people outside the above group) believed strongly in preserving anonymous reviewing. Should we just fully go with one group? ...