Peyman Milanfar Profile picture
Mar 29, 2020 6 tweets 3 min read Read on X
(1/5) One of the most surprising and little-known results in classical statistics is the relationship between the mean, median, and standard deviation. If the distribution has finite variance, then the distance between the median and the mean is bounded by one standard deviation. Image
(2/5) We assigned this as a HW exercise in a class I taught as a grad student at MIT circa 1991

Coincidentally, it was written up around the same time by C. Mallows in "Another comment on O'Cinneide" The American Statistician, 45-3

Proof is easy using Jensen's inequality twice: Image
(3/5) If the distribution is unimodal, the bound is even tighter.
epubs.siam.org/doi/10.1137/S0…
Image
(4/5) What about in higher dimensions?

Yes, defining the median appropriately, that works too: median here is the "spatial median": the (unique) point m minimizing the sum of distances E(|x-m|-|x|) to the sample points.

The result appears in this book:
amazon.com/Random-Vectors…

Image
Image
(5/5) Results like this are not just curiosities, but quite useful in practice as they allow estimates of one quantity given the other two in a distribution-free manner. This is important in meta-analyses of studies in biomedical sciences etc

(Open Access) ncbi.nlm.nih.gov/pmc/articles/P…
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Peyman Milanfar

Peyman Milanfar Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @docmilanfar

Mar 18
Tweedie's formula is super important in diffusion models & is also one of the cornerstones of empirical Bayes methods.

Given how easy it is to derive, it's surprising how recently it was discovered ('50s). It was published a while later when Tweedie wrote Stein about it

1/n Image
The MMSE denoiser is known to be the conditional mean f̂(y) = 𝔼(x|y). In this case, we can write the expression for this conditional mean explicitly:

2/n Image
Note that the normalizing term in the denominator is the marginal density of y.

3/n Image
Read 8 tweets
Feb 16
Images aren’t arbitrary collections of pixels -they have complicated structure, even small ones. That’s why it’s hard to generate images well. Let me give you an idea:

3×3 gray images represented as points in ℝ⁹ lie approximately on a 2-D manifold: the Klein bottle!

1/4 Image
Images can be thought of as vectors in high-dim. It’s been long hypothesized that images live on low-dim manifolds (hence manifold learning). It’s a reasonable assumption: images of the world are not arbitrary. The low-dim structure arises due to physical constraints & laws

2/4 Image
But this doesn’t mean the “low-dimensional” manifold has a simple or intuitive structure -even for tiny images. This classic paper by Gunnar Carlsson gives a lovely overview of the structure of data generally (and images in particular). Worthwhile reading.

3/4 Image
Read 4 tweets
Feb 8
Michael Jordan gave a short, excellent, and provocative talk recently in Paris - here's a few key ideas

- It's all just machine learning (ML) - the AI moniker is hype

- The late Dave Rumelhart should've received a Nobel prize for his early ideas on making backprop work

1/n Image
The "Silicon Valley Fever Dream" is that data will create knowledge, which will lead to super intelligence, and a bunch of people will get very rich.....

2/n Image
.... yet the true value of technologies like LLMs is that we're getting the benefit of interacting with the collective knowledge of many many individuals - it's not that we will produce one single uber-intelligent being

3/n Image
Read 5 tweets
Jan 26
How are Kernel Smoothing in statistics, Data-Adaptive Filters in image processing, and Attention in Machine Learning related?

I wrote a thread about this late last year. I'll repeat it here and include a link to the slides at the end of the thread.

1/n Image
In the beginning there was Kernel Regression - a powerful and flexible way to fit an implicit function point-wise to samples. The classic KR is based on interpolation kernels that are a function of the position (x) of the samples and not on the values (y) of the samples.

2/n Image
Instead of a fixed smoothing parameter h, we can adjusted it dynamically based on the local density of samples near the point of interest. This enables accounting for variations in the spatial distribution of samples, but doesn't take into account of the values of samples

3/n Image
Read 11 tweets
Dec 20, 2024
Years ago when my wife and I we were planning to buy a home, my dad stunned me with a quick mental calculation of loan payments.

I asked him how - he said he'd learned the strange formula for compound interest from his father, who was a merchant in 19th century Iran.

1/4 Image
The origins of the formula my dad knew is a mystery, but I know it has been used in the bazaar's of Iran (and elsewhere) for as long as anyone can remember

It has an advantage: it's very easy to compute on an abacus. The exact compounding formula is much more complicated

2/4 Image
I figured out how the two formulae relate: the historical formula is the Taylor series of the exact formula around r=0.

But the crazy thing is that the old Persian formula goes back 100s (maybe 1000s) of years before Taylor's, having been passed down for generations

3/4 Image
Read 4 tweets
Dec 8, 2024
How are Kernel Smoothing in statistics, Data-Adaptive Filters in image processing, and Attention in Machine Learning related?

My goal is not to argue who should get credit for what, but to show a progression of closely related ideas over time and across neighboring fields.

1/n Image
In the beginning there was Kernel Regression - a powerful and flexible way to fit an implicit function point-wise to samples. The classic KR is based on interpolation kernels that are a function of the position (x) of the samples and not on the values (y) of the samples.

2/n Image
Instead of a fixed smoothing parameter h, we can adjusted it dynamically based on the local density of samples near the point of interest. This enables accounting for variations in the spatial distribution of samples, but doesn't take into account of the values of samples

3/n Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(