, 19 tweets, 10 min read
My Authors
Read all threads
It’s been 20 years since I submitted my first paper with Nhat Nguyen and the late great Gene Golub on multi-frame super-res (SR). Here’s a thread, a personal story of SR as I’ve experienced it. It won’t be exhaustive or fully historical. Apologies to colleagues for any omissions
Tsai and Huang (1984) were the first to publish the concept of multi-frame super-resolution. Key idea was that a high resolution image is related to its shifted and low-resolution versions in the frequency domain through the shift and aliasing properties of the Fourier transform.
This setup assumed no noise, global translation, and a trivial point sampling process: the sensor blurring effect was ignored. But even with this simple model, the difficulty is clear. We have two entangled unknowns: motion vectors and high res image. A bit more realist model is
Despite the early theory, most of the progress has been, and remains, in the pixel domain. Influential early works for me were Irani & Peleg, Ur & Gross, and Elad & Feuer. All these treated only gray images and made many simplifying assumptions on motion and the capture model.
Reconstructing the high res image is itself TWO hard problems: merge onto high resolution grid + deconvolution. Meanwhile, alignment is also very hard if the images are aliased and noisy. Here’s our state of the art 15 yr ago using “bilateral” total variation on a real sequence.
In the next couple of years, Sina Farsiu’s amazing PhD thesis brought advancements in robustness, treatment of color/raw images, and video. 10 years too soon to be practical, it laid the foundation for what later became super-res zoom in the Pixel phones.
bme.duke.edu/faculty/sina-f…
Another amazing student Dirk Robinson studied the limits of SR upscaling. Conclusion: it’s HARD & depends in complex ways on SNR, # frames, set of motions, image content, and the imaging system’s PSF. But we did learn what could work: using modest upscaling factors and good light
Probably the biggest breakthrough came when we realized that Bayesian methods were too limiting to solve SR. Hiro Takeda’s thesis brought kernel regression very generally to image processing. This paper won @ieeesps best paper & unified many ideas in enhancement & reconstruction
This approach played a key role in eventually enabling practical SR (SIGGRAPH '19). The idea of locally adaptive weighted averaging is fundamental to flexible interpolation for burst image processing. It is easy to formulate, but was, at the time, hard to implement efficiently.
A useful property of kernel methods is that you can compute them not only within-image but also across images. This led both us and Miki Elad's team independently to the idea that computing the weights is like block matching, hence we had implicit, general, motion estimation!
By ~2012, we could reliably do SR in many settings and without too many model constraints. But it could hardly be called practical. This took much longer. Might've happened faster if a) imaging and graphics people talked more, b) GPUs were better. Deep learning would change (b)
I spent much of 2010-12 on other things like denoising. Then in 2012 I got lucky and @marclevoy recruited me to Google-X. We worked on Google Glass image pipeline which evolved to become HDR+ (the first pipeline to do practical burst processing) in the Nexus and Pixel cameras.
I moved to Google Research in 2014 where my team works on many imaging projects including SR. I knew that for doing zoom across a wide range of magnifications, multi-frame SR wouldn’t be enough. With the great Yaniv Romano in '15 we developed RAISR for 1-frame SR (i.e. upscaling)
We revisited multi-frame SR using kernels again starting in 2016. In 2017, I got lucky again and hired the uber-talented @bartwronsk. Coming from the game-dev/graphics world, he had the most effective tricks to really make SR fast, robust, and repeatable bit.ly/33j3aU2
Our work melded everything we knew to date, plus: (1) most mobile imaging systems are aliased (2) natural hand tremor is good! It gives tiny sub-pixel motion after OIS compensation (3) kernel methods are perfect for GPUs. In short, the stars were aligned
bit.ly/34jJwbZ
This October we launched super-res zoom on the 2019 Pixel phones, running on both our main lens zoom and the tele-lens. Pixel rates highest among all smartphones with 2x optical tele, and beats several phones with 3x optical tele lenses. Example album: photos.app.goo.gl/1i575GDLCzNupD…
Science fact: Natural hand tremor is random in magnitude w/ freq in the range 8-12 Hz. The motion consists of a mechanical-reflex component, and a 2nd component that causes micro-contractions in muscles. Curiously, ocular saccades are similar, leading to SR in the visual cortex.
So is SR solved? Surely, there’s more science and engineering to be done. But we’ve definitely crossed the threshold from academic theory/experiment to practical use. This doesn’t happen all that often. And, as I hope is clear from this thread, can take years, even decades.
I've learned a few big lessons
* Making a thing practical is harder than you think
* You'll hear <it'll never work> 1000 times. It's OK
* Don’t dismiss good ideas because they’re not yet practical
* There’s never a “main” inventor - it’s always a team sport
* It pays to be lucky
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Peyman Milanfar

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!