Antoine Yang Profile picture
PhD Student @Inria and @ENS_ULM, working on learning visual language models for video understanding.
Mar 1, 2023 5 tweets 3 min read
Introducing Vid2Seq, a new visual language model for dense video captioning. To appear at #CVPR2023.

Work done @Google w/ @NagraniArsha P.H. Seo @antoine77340 @jponttuset I. Laptev J. Sivic @CordeliaSchmid.

Page: antoyang.github.io/vid2seq.html
Paper: arxiv.org/abs/2302.14115

🧵/5 Most video captioning systems can only describe a single event in short videos. But natural videos may contain numerous events. So we focus on the dense video captioning task, which requires temporally localizing and captioning all events in untrimmed minutes-long videos 🎞️.

2/5