Peyman Milanfar Profile picture
Distinguished Scientist at Google. Computational Imaging, Machine Learning, and Vision. Tweets = personal opinions. May change or disappear over time.
5 subscribers
Dec 20, 2024 4 tweets 2 min read
Years ago when my wife and I we were planning to buy a home, my dad stunned me with a quick mental calculation of loan payments.

I asked him how - he said he'd learned the strange formula for compound interest from his father, who was a merchant in 19th century Iran.

1/4 Image The origins of the formula my dad knew is a mystery, but I know it has been used in the bazaar's of Iran (and elsewhere) for as long as anyone can remember

It has an advantage: it's very easy to compute on an abacus. The exact compounding formula is much more complicated

2/4 Image
Dec 8, 2024 11 tweets 4 min read
How are Kernel Smoothing in statistics, Data-Adaptive Filters in image processing, and Attention in Machine Learning related?

My goal is not to argue who should get credit for what, but to show a progression of closely related ideas over time and across neighboring fields.

1/n Image In the beginning there was Kernel Regression - a powerful and flexible way to fit an implicit function point-wise to samples. The classic KR is based on interpolation kernels that are a function of the position (x) of the samples and not on the values (y) of the samples.

2/n Image
Dec 3, 2024 5 tweets 2 min read
“On a log-log plot, my grandmother fits on a straight line.”
-Physicist Fritz Houtermans

There's a lot of truth to this. log-log plots are often abused and can be very misleading

1/5 Image A plot of empirical data can reveal hidden phenomena or scaling. An important and common model is to look for power laws like

p(x) ≃ L(x) xᵃ

where L(x) is slowly varying, so that xᵃ is dominant

Power laws appear all over physics, biology, math, econ. etc., however...

2/5
Nov 10, 2024 6 tweets 3 min read
Integral geometry is a beautiful topic bridging geometry, probability & statistics

Say you have a curve with any shape, possibly even self-intersecting. How can you measure its length?

This has many applications - curve could be a strand of DNA or a twisted length of wire

1/n Image A curve is a collection of tiny segments. Measure each segment & sum. You can go further: make the segments so small they are essentially points, count the red points

A practical way to do this: drop many lines, or a dense grid, intersecting the shape & count intersections

2/n Image
Sep 24, 2024 4 tweets 2 min read
Smoothing splines fit function to data as the sol'n of a regularized least-squares optimization problem.

But it’s also possible to do it in one shot with an unusually shaped kernel (see figure)

Is it possible to solve other optimization problems this way? Surprisingly yes

1/n Image This is just one instance of how one can “kernelize” an optimization problem. That is, approximate the solution of an optimization problem in just one-step by constructing and applying a kernel once to the input

Given some conditions you can it do much more generally

2/n Image
Sep 18, 2024 4 tweets 2 min read
Mean-shift iteratively moves points towards regions of higher density. It does so by placing a kernel at each data point, calculating the mean of the data points within that window, shifting points towards this mean until convergence: Look familiar?

1/n
(Animation @gabrielpeyre) The first term on the right hand side of the ODE has the form of a pseudo-linear denoiser f(x) = W(x) x. A weighted average of the points where the weights depend on the data. The overall mean-shift process is a lot like a residual flow:

d/dt x(t) = f(x(t)) - x(t)

2/n Image
Sep 5, 2024 8 tweets 4 min read
Random matrices are very important in modern statistics and machine learning, not to mention physics

A model about which much less is known is uniformly sampled matrices from the set of doubly stochastic matrices: Uniformly Distributed Stochastic Matrices

A thread -

1/n
First, what are doubly stochastic matrices?
Non-negative matrices whose row & column sums=1.

The set of doubly stochastic matrices is also known as the Birkhoff polytope: an (n−1)² dimensional convex polytope in ℝⁿˣⁿ with extreme points being permutation matrices.

2/n Image
Sep 1, 2024 10 tweets 3 min read
The perpetually undervalued least-squares:

minₓ‖y−Ax‖²

can teach a lot about some complex ideas in modern machine learning including overfitting & double-descent.

Let's assume A is n-by-p. So we have n data points and p parameters

1/10 Image If n ≥ p (“under-fitting” or “over-determined" case) the solution is

x̃ = (AᵀA)⁻¹ Aᵀ y

But if n < p (“over-fitting” or “under-determined” case), there are infinitely many solutions that give *zero* training error. We pick min‖x‖² norm solution:

x̃ = Aᵀ(AAᵀ)⁻¹ y

2/10
Aug 18, 2024 7 tweets 3 min read
Two basic concepts are often conflated:

Sample Standard Deviation (SD) vs Standard Err (SE)

Say you want to estimate m=𝔼(x) from N independent samples xᵢ. A typical choice is the average or "sample" mean m̂

But how stable is this? That's what Standard Error tells you:

1/6 Image Since m̂ is itself a random variable, we need to quantify the uncertainty around it too: this is what the Standard Error does.

The Standard Error is *not* the same as the spread of the samples - that's the Standard Deviation (SD) - but the two are closely related:

2/6 Image
Aug 15, 2024 13 tweets 9 min read
Did you ever take a photo & wish you'd zoomed in more or framed better? When this happens, we just crop.

Now there's a better way: Zoom Enhance -a new feature my team just shipped on Pixel. Available in Google Photos under Tools, it enhances both zoomed & un-zoomed images

1/n Image Zoom Enhance is our first im-to-im diffusion model designed & optimized to run fully on-device. It allows you to crop or frame the shot you wanted, and enhance it -after capture. The input can be from any device, Pixel or not, old or new. Below are some examples & use cases

2/n

Image
Image
Image
Aug 10, 2024 7 tweets 3 min read
Image-to-image models have been called 'filters' since the early days of comp vision/imaging. But what does it mean to filter an image?

If we choose some set of weights and apply them to the input image, what loss/objective function does this process optimize (if any)?

1/7 Image Such filters can often be written as matrix-vector operations. Think of z, y, and the corresponding weights as vectors and you have a tidy expression relating (all) output pixels to (all) input pixels. If the filter is local (has a small footprint), most weight will be zero.

2/7 Image
Jul 21, 2024 4 tweets 2 min read
Images aren’t arbitrary collections of pixels -they have complicated structure, even small ones. That’s why it’s hard to generate images well. Let me give you an idea:

3×3 gray images represented as points in ℝ⁹ lie approximately on a 2-D manifold: the Klein bottle!

1/3 Image Images can be thought of as vectors in high-dim. It’s been long hypothesized that images live on low-dim manifolds (hence manifold learning). It’s a reasonable assumption: images of the world are not arbitrary. The low-dim structure arises due to physical constraints & laws

2/3 Image
Apr 3, 2024 5 tweets 2 min read
We often assume bigger generative models are better. But when practical image generation is limited by compute budget is this still true? Answer is no

By looking at latent diffusion models across different scales our paper sheds light on the quality vs model size tradeoffs

1/5 Image We trained a range of txt-2-image LDMs & observed a notable trend: when constrained by compute budget smaller models frequently outperform their larger siblings in image quality. For example the sampling result of a 223M model can be better than results of a model 4x larger

2/5 Image
Apr 2, 2024 19 tweets 8 min read
It’s been >20 years since I published my first work on multi-frame super-res (SR) w/ Nhat Nguyen and the late great Gene Golub. Here’s my personal story of SR as I’ve experienced it from theory, to practical algorithms, to deployment in product. In a way it’s been my life’s work Image Tsai and Huang (1984) were the first to publish the concept of multi-frame super-resolution. Key idea was that a high resolution image is related to its shifted and low-resolution versions in the frequency domain through the shift and aliasing properties of the Fourier transform Image
Apr 1, 2024 4 tweets 2 min read
Motion blur is often misunderstood, because people think of it in terms of a single imperfect image captured at some instance in time.

But motion blur is in fact an inherently temporal phenomenon. It is a temporal convolution of pixels (at the same location) across time.

1/4 Image Integration across time (eg open shutter) gives motion blur w/ strength depending on the speed of objects

A mix of object speed, shutter speed and frame rate together can cause aliasing in time (spokes moving backwards) & blur in space (wheel surface) all in the same image

2/4
Mar 27, 2024 4 tweets 2 min read
This is not a scene from Inception. The sorcery is a real photo was taken with a very long focal length lens. When the focal length is long, the field of view becomes very small and the resulting image appears more flat.

1/4 Image Here's another example:

The Empire State building and the Statue of Liberty are about 4.5 miles apart, and the building is 5x taller.

2/4 Image
Mar 24, 2024 5 tweets 3 min read
What is resolution in an image? It is not the number of pixels. Here’s the classical Rayleigh’s criterion taught in basic physics:

1/5 Image This concept is important in imaging because it guides how densely we should pack pixels together to avoid or allow aliasing. (Yes, sometimes aliasing is useful!)

2/5
Image
Image
Mar 12, 2024 6 tweets 3 min read
One of the lesser known ways to compare estimators is "admissibility".

An estimator θ* = g(θ,y) of θ from data y is called *in*admissible if g is uniformly dominated by another estimator g(θ,y) for all values of g(θ,y), say in the MSE sense.

1/6 Image Being admissible doesn't mean the estimator is good; but it's a very useful idea to weed out the bad ones.

A great example is Stein's:
The maximum likelihood estimate of Gaussian mean is inadmissible in d≥3. The nonlinear "shrinkage" that pulls y towards origin beats it

2/6 Image
Mar 8, 2024 6 tweets 2 min read
The familiar differential expression for the Laplacian doesn’t reveal its true nature: It is really a center-surround operator. This is easy to see in 1D :

1/6 Image The same is true in ℝᵈ :

The Laplacian measures how different the function’s value is at the center of a ball as compared to its local average over the ball.

2/6 Image
Mar 1, 2024 9 tweets 3 min read
The perpetually undervalued least-squares:

minₓ‖y−Ax‖²

can teach a lot about some complex ideas in modern machine learning including overfitting & double-descent.

Let's assume A is n-by-p. So we have n data points and p parameters

1/9 Image If n≥ p (“under-fitting” or “over-determined" case) solution is

x̃ = (AᵀA)⁻¹ Aᵀ y

But if n < p (“over-fitting” or “under-determined” case), there are infinitely many solutions that give *zero* training error.

We pick min‖x‖² norm solution:

x̃ = Aᵀ(AAᵀ)⁻¹ y

2/9
Feb 25, 2024 6 tweets 2 min read
What do polar coordinates, polar matrix factorization, & Helmholz decomposition of a vector field have in common? They’re all implied by Brenier’s Theorem: a cornerstone of Optimal Transport theory. It’s a fundamental decomposition result & deserves to be better known.

1/5 Image Brenier's Thm:
A non-degenerate vector field
u: Ω ∈ ℝⁿ→ℝⁿ has a unique decomposition

u = ∇ϕ∘s

where ϕ is a convex potential on Ω, and s is measure-preserving (e.g. density → density).

Here s is a multi-dimensional “rearrangement” (a sort in 1D)

2/5