Tweet

Juan Carlos Niebles -Looking for interns #ICCV2021

Jun 21 • 10 tweets • 6 min read

@CristbalEyzagu2

In our #CVPR2022 Oral, we introduce the atemporal probe (ATP) to analyze *atemporal* (single frame) bias in video-language, with surprising results! (see 🧵)

Led by Shyamal Buch with @CristbalEyzagu2, @adnothing, @jiajunwu_cs, @drfeifei

atp-video-language.stanford.edu

1/10

The promise of videos is the potential to go *beyond* image-centric understanding (people, objects, scenes, etc.) towards event temporality, causality, and dynamics. Ideally, we want video-language benchmarks and models to realize this promise.

2/10

Our paper focuses on a fundamental question in video research: to what extent can "image-centric" understanding address "video" understanding?

Consider the example below: can we answer the question with only a single frame?

3/10

Standard techniques for measuring "image-centric" understanding: sampling a random frame, mean pooling...

But - videos can be naturally noisy! (frames w/camera blur, weird angles, uninformative)
Maybe standard methods under-represent the real boundary of image vs. video?

4/10

What if we were to select a "good frame" though? This is the core idea of our atemporal probe (ATP) model: use a *frozen* image-language encoder (CLIP) on a few random frames, then select one "good" encoding (without any temporal information) to pass on to the final task.

5/10

Surprising: ATP gives a strong "image-centric" bound for videos and language, even outperforming some state-of-the-art video models!

Takeaways: (1) datasets may be well-addressed with "single frame", (2) video-language models may be held back by processing noise.

6/10

Takeaways hold even for dataset explicitly designed for temporal/causal video-language understanding!

7/10

We take steps to "close the loop" on both takeaways: First, we show ATP can help identify temporally (multi-frame) challenging data, to better measure progress of video design components (temporal, motion, etc.). ATP is promising for future in-the-loop dataset design!

8/10

Second, we show how ATP can plug into the input of a multi-frame temporal model, and improve accuracy (less noise) and efficiency (fewer frames).

9/10

For more, please visit our oral talk (Session 1.2.3; today afternoon) and poster (Session 1.2).
Project: atp-video-language.stanford.edu
Paper: arxiv.org/abs/2206.01720
Video:

10/10

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @jcniebles

Juan Carlos Niebles -Looking for interns #ICCV2021

@jcniebles

Oct 11, 2021

@CarlosH_93

Excited to introduce a new generation of privacy protecting smart cameras in our #ICCV2021 Oral paper by @CarlosH_93, @henarfu and myself 🇺🇸🇨🇴. See 🧵 below for details!

PDF: openaccess.thecvf.com/content/ICCV20…
Talk:
Project: carloshinojosa.me/project/privac…

@StanfordAILab

Our cameras introduce optical distortions at acquisition time, capturing images that protect the identity of people in the scene while enabling vision algorithms to perform inference accurately.

*Key idea*: joint end-to-end optimization of optical distortions & vision algorithm.

We achieve this by back-propagating all the way down to the camera lens. To make the lens shape differentiable, we parametrize it using Zernike polynomials.

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Juan Carlos Niebles -Looking for interns #ICCV2021

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jcniebles

Juan Carlos Niebles -Looking for interns #ICCV2021

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?