Tweet

Hilde Kuehne

Jun 16 • 5 tweets • 5 min read

@MITIBMLab

Check out our #CVPR2022 paper! We improve multimodal zero-shot text-to-video retrieval on Youcook2/MSR-VTT by leveraging fusion transformer and combinatorial loss. 1/🧵

#ComputerVision #AI #MachineLearning

@MITIBMLab @goetheuni @MIT_CSAIL @IBMResearch

@ninashv__

If you want to go directly to the paper/code, please check out:
paper: arxiv.org/abs/2112.04446
Github link: github.com/ninatu/everyth…

Great work by @ninashv__ , @Brian271828, @arouditchenko Samuel Thomas, Brian Kingsbury, @RogerioFeris , David Harwath, and James Glass.

We propose a multimodal modality agnostic fusion transformer that learns to exchange information between multiple modalities, e.g. video, audio, text, and builds an embedding that aggregates multi-modal information.

We train the system with a combinatorial loss on everything at once, single modalities as well as pairs.
At test time, the model can process and fuse any input modalities and inputs of different lengths, gets SotA results, and allows attention analysis of modalities.

@andrewhowens

If you want to know more, join us at #CVPR2022 and the Sight and Sound workshop sightsound.org or the Fri 2pm session in person!

@andrewhowens

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @HildeKuehne

Hilde Kuehne

@HildeKuehne

Jun 17

Happy to finally share our paper about differentiable Top-K Learning by Sorting that didn’t make it to #CVPR2022, but was accepted for #ICML2022! We show that you can improve classification by actually considering top-1 + runner-ups… 1/6🧵

#ComputerVision #AI #MachineLearning

@FHKPetersen

Paper: arxiv.org/abs/2206.07290

Great work by @FHKPetersen in collaboration with Christian Borgelt, @OliverDeussen . 2/6🧵

@MITIBMLab @goetheuni @UniKonstanz

Idea: Top-k class accuracy is used in many ML tasks, but training is usually limited to top-1 accuracy (or another k). We propose a differentiable top-k classification loss that allows training by considering any combination of top-k predictions, e.g. top-2 top-5, 3/6🧵

Read 7 tweets

Share this page!

Hilde Kuehne

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @HildeKuehne

Hilde Kuehne

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?