Stuff I wish I had known sooner: "Pinsker's inequality is cancelled," a thread. 🧵

If you want to relate total variation (TV) and Kullback-Leibler divergence (KL), then everyone, from textbooks to Google, will point you to Pinsker's inequality: TV(p,q) ≤ √(½ KL(p||q) )

1/
It's warranted: a great inequality, and tight in the regime where KL→0. Now, the issue is that KL is unbounded, while 0≤TV≤1 always, so the bound becomes vacuous when KL > 2. Oops.

2/

*I'm using it with the natural log, by the way, so no pesky log 2 constants.
This is annoying, because in many situations, you would want to bound a TV close to 1: for instance, in hypothesis testing: "how many samples to distinguish between an ε-biased and a fair coin, with probability 1-δ?"

Well, you'd like to start bounding 1-δ ≤ TV ≤ ?.

3/
Here's the kicker: nobody ever told me,* but there is a simple bound that works in the regime KL≫1! A bound that is always non-vacuous, and goes to 1 as KL→∞.

*I mean, a paper did by referencing Tsybakov's "Introduction to Nonparametric Estimation": arxiv.org/abs/2009.06540
4/
Cool. So now, here's what we get: we can take the minimum of what Pinsker's and this new inequality give to get the bound in red, which objectively is pretty nice and useful everywhere.

Are we done? No. Maybe we don't have to consider these two bounds separately?

5/
So, let's see Tsybakov's proof. It's short, but it uses Jensen's so *clearly* we can't trust it. Nope.

Tsybakov credits the inequality to Bretagnolle and Huber (1979), so let's look at those and see if they don't prove something a bit tighter?

6/
Here, for the faint of heart: fly, you fools! It's from the pre-#LaTeX times, when people were people and wrote their formulae BY HAND.

Also, in French.

Anyways, all the action is in Lemma 2.1, and more specifically its proof.

7/
So, here's the new bound we have, if I can parse this freaking proof correctly (pardon my... French?).

It's always at least as good as the bound by Tsybakov, since that used the inequality √(1-exp(-x)) ≤ 1-(1/2)exp(-x) on top of it. But it behaves much better as KL→0.

8/
So here the current state. The green curve is the new inequality we have, which, OK, is not as tight as it gets everywhere (sometimes Pinsker's is still better)...

But is it? One can easily check that our new bound (which, again, is non-vacuous for *all* values of KL) is...

9/
... always within a √2 factor of what Pinsker's inequality would give. Big deal...

So, summary: forget Pinsker's, we have better (unless you really care about constants, but then, who are you?).

And again, *why did I not know this before*?

10/
To conclude, 2 points:
(1) As @mraginsky pointed out, bounds tighter than Pinsker's are known and discussed in some places. E.g., Yihong Wu's notes (again), Section 5.2.2 have some... but I find it much harder to parse, and it never actually clicked.
stat.yale.edu/~yw562/teachin…
11/
(2). Thomas Steinke (@shortstein) mentioned that Pinsker's can be derived from "the best inequality" (combined with Hoeffding's lemma). Can this new, amazing inequality be derived from it as well?

12/
Finally, going to write the inequality (and its weaker form given in [Tsybakov09]) again. It is always better than Pinsker's, up to a √2 factor at worst, and *significantly* better in the regimes KL≥2 and KL→∞ where Pinsker's is vacuous.

As I said, Pinsker is cancelled.
/end
Incidentally, this GIF was suggested by @thejonullman, a man of refined taste and sophistication if there ever was one.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Clément Canonne

Clément Canonne Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ccanonne_

4 Dec
📊 Answers and discussion for yesterday's quiz on sequences (and their names).

Before we go to the first question, again: if you haven't, bookmark oeis.org. It's useful: the solution to some of your problems may be recorded there!

1/
So, the first question... was a trap. All three answers were valid...

I was personally thinking of 203, since that corresponds to the next Bell number (en.wikipedia.org/wiki/Bell_numb…).

But maybe your were thinking of something else? Maybe the number of...

2/
... ascent sequences avoiding the pattern 201 (oeis.org/A202062)? Or the number of set partitions of [n] that avoid 3-crossings (oeis.org/A108304)? Who am I to judge?

(I prefer Bell numbers.)

3/
Read 10 tweets
3 Dec
A paper with a broken and ill-defined "solution" to privacy in ML gets a $50k prize?! While said solution was already fully broken at the time of the award—on a challenge chosen by the authors?🤦‍♂️

The presentation slides must have been hella convincing.
Will Carlini et al. get a share of the prize money for breaking the system prior to the award ceremony? arxiv.org/abs/2011.05315
Disclaimer: I am in no way affiliated w/ any of the authors of the attack, nor w/ the authors of the original paper, nor w/ basically anyone involved. I'm just dismayed at the signal sent, & the overall waste—so much good work done elsewhere, and this is what gets the highlight.
Read 5 tweets
3 Dec
📊 It's Thursday, time for our weekly (but weakly so) quiz! This week: integers, and remarkable sequences.

See this as an extended ad for the OEIS, if you will. Also, if you don't know the OEIS: go check it out (and come back)! oeis.org

1/
Let's start with a "fun" one. To be unhelpful, let's say as a hint that I'm expanding exp(e^x-1) as a series around 0, and am interested in the coefficients

Q1: Guess what number comes next—maybe this sequence will ring a bell? [There might be a trap]
1, 1, 2, 5, 15, 52, ??

2/
We won't do too many of those "guess what's next" games though, since they are mostly meaningless. But let's mention that, of course, @xkcdComic has a comic about integer sequences, because XKCD is basically Borges' library for the Internet.
xkcd.com/2016/
3/
Read 8 tweets
27 Aug
📊 It's Thursday for many, time for this week's quiz! As alluded earlier, we're going to flip coins. Like, a lot.

Today: you have a biased coin which lands heads with some probability p∊(0,1), and you can toss it as many times as you want. You don't know p, though...

1/10
... which is a real shame, since you want to simulate *another* biased coin which lands heads w/ proba exactly f(p), where f: (0,1)→(0,1) is fixed.
(Also, you'd like to do this efficiently, thank you very much.)

Let's start with a simple function: f(p)=1/2.

Q1: Can you?

2/10
OK, so that was the warmup. Let's formalize a bit more what "efficiently" means here. For us, it will mean (1) with a number of coins finite almost surely (no infinite sequence of tosses), and (2) with a finite automaton.

Ha.

A finite automaton.

3/10
Read 10 tweets
6 Aug
📊 It's Thursday [reference needed], time for a weᵄekly quiz! Given some recent events about randomized polynomial time (RP), it feels like a short discussion about 🎲 is topical.

So, let's go: Las Vegas, Monte Carlo, and... Bellagio (?) algorithms!

1/8
Let's start with some definitions—because that's fun, right. We'll discuss 3 types of decision algorithms: BPP (Monte Carlo), ZPP (Las Vegas), and RP (the... other) algorithms.

All three are randomized 🎲. What's the difference? The type of errors they allow.

2/8
↬BPP: efficient (run in poly time), can make mistakes but only w/ probability ≤¼

↬RP: efficient, can make mistakes but only when they say "no", and even then w/ probability ≤½

↬ZPP: efficient *in expectation* (yet could run forever sometimes), but never make mistakes.

3/8
Read 9 tweets
30 Jul
📊 Today, (weakly|weekly) quiz will be about graphs. The ones with nodes and edges, not those plotting against you.

Specifically, *planar* graphs. The nice ones.

1/7
What's an (undirected) graph here? Just a pair G=(V,E) of n=|V| vertices (nodes) and m=|E| edges, where E ⊆ V×V and each edge in E is a pair (v₁,v₂) indicating if the two nodes v₁ and v₂ are connected.
If E=V×V (all possible edges), G is the complete graph, Kₙ.

2/7
(above, that's K₇, also very cool on t-shirts.). A graph G is *planar* if you can draw it on a sheet of paper w/o crossing edges ("can be embedded in the plane"). Kₙ, for instance, is not planar.

Leading us to Q1: I give you G. How hard is it to decide if it's planar?

3/7
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!