Excited to introduce a new generation of privacy protecting smart cameras in our #ICCV2021 Oral paper by @CarlosH_93, @henarfu and myself 🇺🇸🇨🇴. See 🧵 below for details!
Our cameras introduce optical distortions at acquisition time, capturing images that protect the identity of people in the scene while enabling vision algorithms to perform inference accurately.
We achieve this by back-propagating all the way down to the camera lens. To make the lens shape differentiable, we parametrize it using Zernike polynomials.
Our system can be implemented in two ways. First, we can build a physical lens to enable hardware-level protection so that images are always distorted from the start, even at acquisition time. Note, this protection is orthogonal to other techniques such as differential privacy.
As a second option when attaching a new lens is not possible, we can implement our distortion in firmware/on-device, so that images are distorted immediately after acquisition but before the images leave the device, before they are stored or before any further processing.
For more details, see our #ICCV oral presentation during Session 2A and Session 2B. Looking forward to see you there!
In our #CVPR2022 Oral, we introduce the atemporal probe (ATP) to analyze *atemporal* (single frame) bias in video-language, with surprising results! (see 🧵)
The promise of videos is the potential to go *beyond* image-centric understanding (people, objects, scenes, etc.) towards event temporality, causality, and dynamics. Ideally, we want video-language benchmarks and models to realize this promise.
2/10
Our paper focuses on a fundamental question in video research: to what extent can "image-centric" understanding address "video" understanding?
Consider the example below: can we answer the question with only a single frame?