Hilde Kuehne Profile picture
Jun 16 5 tweets 5 min read
Check out our #CVPR2022 paper! We improve multimodal zero-shot text-to-video retrieval on Youcook2/MSR-VTT by leveraging fusion transformer and combinatorial loss. 1/🧵

#ComputerVision #AI #MachineLearning

@MITIBMLab @goetheuni @MIT_CSAIL @IBMResearch Image
If you want to go directly to the paper/code, please check out:
paper: arxiv.org/abs/2112.04446
Github link: github.com/ninatu/everyth…

Great work by @ninashv__ , @Brian271828, @arouditchenko Samuel Thomas, Brian Kingsbury, @RogerioFeris , David Harwath, and James Glass.
We propose a multimodal modality agnostic fusion transformer that learns to exchange information between multiple modalities, e.g. video, audio, text, and builds an embedding that aggregates multi-modal information. Image
We train the system with a combinatorial loss on everything at once, single modalities as well as pairs.
At test time, the model can process and fuse any input modalities and inputs of different lengths, gets SotA results, and allows attention analysis of modalities. Image
If you want to know more, join us at #CVPR2022 and the Sight and Sound workshop sightsound.org or the Fri 2pm session in person!

@andrewhowens

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Hilde Kuehne

Hilde Kuehne Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @HildeKuehne

Jun 17
Happy to finally share our paper about differentiable Top-K Learning by Sorting that didn’t make it to #CVPR2022, but was accepted for #ICML2022! We show that you can improve classification by actually considering top-1 + runner-ups… 1/6🧵

#ComputerVision #AI #MachineLearning Image
Paper: arxiv.org/abs/2206.07290

Great work by @FHKPetersen in collaboration with Christian Borgelt, @OliverDeussen . 2/6🧵

@MITIBMLab @goetheuni @UniKonstanz
Idea: Top-k class accuracy is used in many ML tasks, but training is usually limited to top-1 accuracy (or another k). We propose a differentiable top-k classification loss that allows training by considering any combination of top-k predictions, e.g. top-2 top-5, 3/6🧵
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(