Follow @ccanonne_

12,399 views

Clément Canonne

Follow @ccanonne_

, 18 tweets, 9 min read

My Authors

https://twitter.com/ccanonne_/status/1281129193187598337

https://twitter.com/ccanonne_/status/1281129193187598337

📊 Answers and comments about yesterday's online quiz on learning.

Not a quiz on online learning. Though there was an (online learning) component on that online (learning quiz), which maybe led to some learning online, so... I am confused now.

https://twitter.com/ccanonne_/status/1281129193187598337

1/16

https://twitter.com/ccanonne_/status/1281129204369600512

https://twitter.com/ccanonne_/status/1281129204369600512

Let's start with PAC learning. All I'm saying here applies for classification, i.e., learning functions over some space 𝒳 taking boolean values: f: 𝒳 →{0,1}.

Q1 asked what quantity, if any, captured PAC-learnability. 71.3% got it right...

https://twitter.com/ccanonne_/status/1281129204369600512

2/16

... it's the VC dimension, a combinatorial notion named after Vapnik and Chervonenkis: en.wikipedia.org/wiki/Vapnik–Ch…

That's, arguably, *the* cornerstone of statistical learning/computational learning theory. There are many great books+lecture notes about this, e.g., this one...

3/16

... and it's a deeply influential notion. Even today, understanding the exact complexity of PAC learning under various constraints is a very active area, by the way. Check #COLT2020 for some papers about it even this year—including the best paper! colt2020.org/virtual/papers…

4/16

https://twitter.com/ccanonne_/status/1281129206697492481

https://twitter.com/ccanonne_/status/1281129206697492481

Which brings us to Q2: PAC-learning is fine, but PAC-learning efficiently is better! Can we?

I am sorry to say that as far as I know, it's the 19.7% going with unicorns who got it right... 🦄 We have, even now, very little understanding of those...

https://twitter.com/ccanonne_/status/1281129206697492481

5/16

... computational issues (please, correct me if I'm wrong!). E.g., we know that efficient *proper* PAC learning (where the output hypothesis needs to be in the same class 𝓒, so you can't for instance approximate a linear function with a degree-17 polynomial) is...

6/16

... impossible unless P=NP. The general "improper" learning algorithms we have are not efficient themselves either, though efficient algorithms can exist for some 𝓒 on a case-by-case basis. So, basically: 🤷

7/16

So let's change the learning model and go fully online. No underlying statistical model, just an adversarial setting where someone wants to make you fail:* Online Learning, Mistake Bound.

* So, basically, a learning model equivalent to my middle school Latin teacher.

8/16

https://twitter.com/ccanonne_/status/1281129209327259653

https://twitter.com/ccanonne_/status/1281129209327259653

The answer to Q3 is, I think, quite surprising (though not apparently to 43% of you): in this model, the Littlestone dimension fully characterizes what is learnable. Better, it comes with a canonical algorithm that always does the job!

What is...

https://twitter.com/ccanonne_/status/1281129209327259653

9/16

... the Littlestone dimension, you ask? It's kind of tricky, but here it is—stealing the def from lecture notes by Akshay Krishnamurthy:
people.cs.umass.edu/~akshay/course…

Anyways. It's combinatorial, depends only on the class 𝓒 to learn, and it captures exactly the right thing!

10/16

Last question now. What if we want to PAC-learn classifiers from a class 𝓒, but in a differentially #private (DP) manner?
By this I mean (approximate) (ε,δ)-DP: for small ε and δ, the output reveals very little about any single data point from the training set.

11/16

https://twitter.com/ccanonne_/status/1281129219615895552

https://twitter.com/ccanonne_/status/1281129219615895552

Here comes the magic (25.9% got it right): it's PAC learning, sure, but somehow the right quantity is... the Littlestone dimension. Not the VC dimension. Not the number of possible functions |𝓒|.

Why? How? I wish I knew...

https://twitter.com/ccanonne_/status/1281129219615895552

12/16

@markmbun

@markmbun

... but I know someone who does! SomeoneS, actually. This has been recently proven in an amazing paper by Bun (@markmbun), Livni, and Moran, the culmination of a line of research hinting at this bizarre connection between privacy and online learning. (Oh, hi, @SethVNeel.)

13/16

@TCS_plus

@TCS_plus

I encourage you to watch the talk Mark gave about this on @TCS_plus:
🎥 sites.google.com/site/plustcs/p…
or watch yesterday's #COLT2020 keynote by Salil Vadhan, which touches upon this: colt2020.org/virtual/speake… (the 🎥 recording will be online at some point, I believe)

14/16

@thesasho

@thesasho

To conclude: many, many things were not mentioned here. Feel free to add refs, discuss, comment below! ↯

Also, I want to mention that for a different notion of privacy, *local* privacy, a recent work of Edmonds, Nikolov (@thesasho), and Ullman provides some analogue of..

15/16

https://twitter.com/ccanonne_/status/1275567970207428608

https://twitter.com/ccanonne_/status/1275567970207428608

... the above characterization, at least in the noninteractive setting:

https://twitter.com/ccanonne_/status/1275567970207428608

What about other notions: pan-privacy, shuffle privacy? What about interactivity?

So many questions!

16/end

@Aaroth

@Aaroth

Note: @Aaroth and @thesasho rightly pointed out to me that as far as query complexity goes* Statistical Query (SQ) learning and (interactive) localy private (LDP) PAC-learning are equivalent, up to polynomial factors.

*The same is not known for *sample* complexity...

... a quite subtle point for me. I believe that this equivalence SQ↭LDP was shown in the original KLNRS'08 paper, "What Can We Learn Privately?": arxiv.org/abs/0803.0924

Also, for "pure" (ε,0)-DP PAC learning (no δ), a ≠ characterization was obtained in jmlr.org/papers/v20/18-…

Try unrolling a thread yourself!

Related hashtags

More from @ccanonne_ see all

Embed code for your website

Did Thread Reader help you today?