New paper on arXiv: "Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism," with @Samuel_BKH and @mahbodm_.
Finally resolves an open problem on my mind since April 2019 (~2.5 years ago).
We give the 1st algorithm for mean estimation which is simultaneously:
-(ε,0)-differentially private
-O(d) sample complexity
-poly time
The fact we didn't have such an algorithm before indicates something was missing in our understanding of multivariate private estimation. 2/n
This algorithm is an instance of a broader framework which employs Sum-of-Squares for private estimation. This is the first application of SoS for DP that I am aware of. We apply this framework for two sub-problems, I'm sure there are more applications lurking. 3/n
Besides the fact that this solves a problem near and dear to my heart, I am also excited because this is another triumph of a recent direction related to efficient multivariate statistics. It has succeeded in achieving robustness and heavy tails, and now privacy. 4/n
Problem is very simple: estimate the mean of a distribution with covariance Σ <= 1, under pure DP. Sounds easy! But the standard techniques are all deficient in terms of sample complexity, the privacy guarantee, or running time. All other techniques have the same drawbacks! 5/n
To arrive at our method, let's look at how you'd (naively) run the exponential mechanism. 1. Define a set of candidate means. Use a cover of the space for this purpose (size ~2^d). 2. Score function for a candidate: "Does the candidate 'look correct' in every 1D projection"? 6/n
Both parts are inefficient. To deal with the first, recall that the exp mech is a sampling-based procedure. If the score function is concave, then we can employ tools for efficient log-concave sampling (e.g., Bassily-Smith-Thakurta arxiv.org/abs/1405.7085). 7/n
See also recent improvements by Mangoubi-@NisheethVishnoi (arxiv.org/abs/2111.04089).
We further need the score function to be Lipschitz, and to switch to a continuous analogue of the exponential mechanism. 8/n
To address the second issue: we turn to recent advances in high-dimensional robust estimation. These excel at "finding interesting directions" efficiently. Our method resembles a class of estimators introduced by @lugosi_gabor-Mendelson (arxiv.org/abs/1702.00482), which was... 9/n
made efficient by @Samuel_BKH (arxiv.org/abs/1809.07425). Our algo can be viewed as a private adaptation of Cherapanamjeri-Flammarion-Bartlett (arxiv.org/abs/1902.01998). From some starting pt: 1. Use SDP to find an "interesting" direction. 2. Choose stepsize. 3. Step, repeat. 10/n
Our method, from some starting pt: 1. *Privately* find an interesting direction 2. *Privately* choose step size 3. Step, repeat.
Step 1 is hiding a lot. It invokes the exponential mechanism, which itself runs the log-concave sampler where the function value is an SDP! 11/n
Taking a step back, the general recipe is something like the following. If you can define a score function with "nice properties," then you can efficiently run the exp mech to produce a high utility point privately. 12/n
To be a bit more precise, we also have a meta-theorem for SoS score functions. This is kind of a mouthful. There is nothing inherently special about SoS score functions though (to my knowledge), but they seem to be convenient for establishing the properties we desire. 13/n
But back to the specific problem of private mean estimation. Here's our main theorem in all its glory. Wow, I almost forgot to mention: we get robustness and sub-Gaussian tails for free! It's an all-in-one estimator. Recall the problem was open even without these desiderata. 14/n
Anyway, if any of this interests you, please check out the paper (arxiv.org/abs/2111.12981). This is a very exciting time for private statistics!
(Thread) How can I say no to this? My course will be made public! Stay tuned.
I was pulling your leg here. Here's my (hot? cold?) take: academics have a *moral obligation* to make as much of their material public as possible. Not just papers, but code, talks, and lectures. 1/n
95% of the work for most content is preparing it. So why not present it so the whole world can appreciate it? arXiv, Github, and video sites.
All academics should learn how to record videos in high quality, say using @OBSProject (not just Zoom). It's imperative for the future 2/n