, 10 tweets, 5 min read
My Authors
Read all threads
Comparing distributions: Kernels estimate good representations, l1 distances give good tests

A simple summary of our #NeurIPS2019 work
gael-varoquaux.info/science/compar…

Given two set of observations, how to know if they are drawn from the same distribution? Short answer in the thread..
For instance, do McDonald’s and KFC use different logic to position restaurants? Difficult question! We have access to data points, but not the underlying generative mechanism, governed by marketing strategies.
To capture the information in the spatial proximity of data points, kernel mean embeddings are useful. They are intuitively related to Kernel Density Estimates
Metrics between distributions built from differences of their kernel mean embeddings can capture when probability measures "weakly converge": they give close probabilities to events that are close in measurement space, and not only to the exact same events (restaurants next door)
An example where weak convergence is important is for neighboring Diracs: comparing only probabilities at the same point gives infinite distance between the distributions if they are not exactly equal.

Kernel capture weak convergence by representing measurement neighborhoods.
We characterize a family of metrics between distributions defined via Lp distances between their kernel representatives.

With common kernels, the difference between representatives is dense. As a result, the l1 norm captures best their differences.
Intuitively, dense representations lie on diagonals while sparse ones are aligned with the coordinate axes. l1 norms make best the difference.

We show that l1 differences of distribution representations lead to good two-sample tests, with good power and closed-form null.
For fast and performant tests, instead of computing full sums over the measurement domains, the metrics can be sampled at a few locations Tj, random or optimized.
More in the full post, that itself links to the paper
gael-varoquaux.info/science/compar…

This framework builds upon solid mathematical foundations (RKHS), fast testing procedures, and a line of results that originated from MMD, maximum mean discrepancy.
For those at #NeurIPS2019 , @ArthurGretton, @wittawatj and D Sutherland give on Monday a tutorial on these concepts nips.cc/Conferences/20…

@ScetbonM presents this specific work on Thur with a spotlight at 4:55pm and after during the poster reception
nips.cc/Conferences/20…
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Gael Varoquaux

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!