Peyman Milanfar Profile picture
Aug 16, 2021 7 tweets 3 min read Read on X
What do polar coordinates, polar matrix factorization, & Helmholz decomposition of a vector field have in common?

They’re all implied by Brenier’s Theorem: a cornerstone of Optimal Transport theory. It’s a fundamental decomposition result & rly deserves to be better known.

1/7
Brenier's Thm ('91): A non-degenerate vector field
u: Ω ∈ ℝⁿ → ℝⁿ has a unique decomposition

u = ∇ϕ∘s

where ϕ is a convex potential on Ω, and s is measure-preserving (think density → density).

Here s is a multi-dimensional “rearrangement” (a sort in 1D)

2/7
In optimal transport, Brenier's thm implies existence, uniqueness & monotonicity of an OT map w.r.t L₂ cost, between two given densities p(x) & q(y). Let

u: ℝⁿ → ℝⁿ & c(x,y) = ‖x − y‖²,

Optimal map u = ∇ϕ taking p to q where ϕ is convex.

3/7

Brenier proved a weaker result in '87 in a manuscript in French, but later in '91 published the definitive version in Comms. on Pure and Applied Maths.

* It's a wonderful paper and well-worth reading on its own merit & to learn the special cases *

citeseerx.ist.psu.edu/viewdoc/downlo…

4/7
Like any great idea, it was (sort of) scooped. But luckily others were in faraway fields & less general

One was in weather forecasting, other in statistics:
Given x and y w/ densities p(x), q(y) find a function
y = f(x) that maximizes 𝔼(xy)

Soln: f =∇ϕ for some convex ϕ

5/7
What if the data live on a manifold?

Well, for this case there's a very cool generalization of Brenier's result by McCann, where the magical exponential maps makes an appearance.

(Amazing that this paper is still just a preprint 20 yrs on!)

mis.mpg.de/preprints/1999…

6/7
The importance of Brenier's Thm has only grown recently in Machine Learning and Statistics.

Not only is it a cornerstone of optimal transport generally, but it is also being deployed in recent works addressing "potential flows."

7/7 Fin

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Peyman Milanfar

Peyman Milanfar Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @docmilanfar

Jan 20
Years ago when my wife and I we were planning to buy a home, my dad stunned me with a quick mental calculation of loan payments.

I asked him how -he said he'd learned the strange formula for compound interest from his father, who was a merchant born in 19th century Iran

1/4 Image
The origins of the formula my dad knew is a mystery, but I know it has been used in the bazaar's of Iran (and elsewhere) for as long as anyone can remember

It has an advantage: it's very easy to compute on an abacus. The exact compounding formula is much more complicated

2/4 Image
I figured out how the two formulae relate: the historical formula is the Taylor series of the exact formula around r=0.

But the crazy thing is that the old Persian formula goes back 100s (maybe 1000s) of years before Taylor's, having been passed down for generations

3/4 Image
Read 4 tweets
Oct 4, 2025
How Kernel Regression is related to Attention Mechanism - a summary in 10 slides.

0/1 Image
1/10 Image
2/10 Image
Read 13 tweets
Aug 22, 2025
Yesterday at the @madebygoogle event we launched "Pro Res Zoom" Pixel 10Pro series. I wanted to share a little more detail, some examples and use cases. The feature enables a combined optical + digital zoom up to 100x magnification. It builds on our optical 5x tele camera.

1/n
Shooting at mags well above 30x requires that the 5x optical capture be adapted and optimized for such conditions, yielding a high quality crop that's fed to our upscaler. The upscaler is a large enough model to understand some semantic context to try & minimize distortions

2/n Image
Image
Image
Image
Given the distances one might expects to shoot at such high magnification, it's difficult to get every single detail in the scene right. But we always aim to minimize unnatural distortions and stay true to the scene to the greatest extent possible.

3/n Image
Image
Image
Image
Read 5 tweets
Jul 14, 2025
Receiver Operating Characteristic (ROC) got its name in WWII from Radar, invented to detect enemy aircraft and ships.

I find it much more intuitive than precision/recall. ROC curves show true positive rate vs false positive rate, parametrized by a detection threshold.

1/n
ROC curves show the performance tradeoffs in a binary hypothesis test like this:

H₁: signal present
H₀: signal absent

From a data vector x, we can write ROC directly in terms of x. But typically, some T(x) - a test statistic - is computed, and compared to a threshold γ

2/n Image
ROC curves derived from general likelihoods are always monotonically increasing

This is easy to see from the definition of Pf and Pd. The slope of the ROC curve is non-negative.

Pro-tip: If you see a ROC curve in a paper or talk that's not so, ask why.

3/n Image
Read 5 tweets
Apr 30, 2025
The choice of nonlinear activation functions in neural networks can be tricky and important.

That's because iterating (i.e. repeatedly composing) even simple nonlinear functions can lead to unstable, or even chaotic behavior, even with something as simple as a quadratic.

1/n Image
Some activations are more well-behaved than others. Take ReLU for example:

r(x) = max{0,x}

its iterates are completely benign r⁽ⁿ⁾(x) = r(x), so we don't have to worry.

Most other activations like soft-plus are less benign, but still change gently with composition.

2/n
Soft-plus:

s(x) = log(eˣ + 1)

has a special property: its n-times self-composition is really simple

s⁽ⁿ⁾(x) = log(eˣ + n)

With each iteration, s⁽ⁿ⁾(x) changes gently for all x.

This form is rare -- most activations don't have a nice closed form iterates like this

3/n
Read 7 tweets
Mar 18, 2025
Tweedie's formula is super important in diffusion models & is also one of the cornerstones of empirical Bayes methods.

Given how easy it is to derive, it's surprising how recently it was discovered ('50s). It was published a while later when Tweedie wrote Stein about it

1/n Image
The MMSE denoiser is known to be the conditional mean f̂(y) = 𝔼(x|y). In this case, we can write the expression for this conditional mean explicitly:

2/n Image
Note that the normalizing term in the denominator is the marginal density of y.

3/n Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(