Aaron Roth Profile picture
CS professor at Penn. Amazon Scholar at AWS. Author of The Ethical Algorithm (w/ Michael Kearns). I study machine learning, privacy, game theory, and fairness.
Tapan jain Profile picture 1 subscribed
Apr 9 6 tweets 2 min read
We'd like LLM outputs to be accompanied by confidence scores, indicating the confidence we should have in them. But what semantics should a confidence score have? A minimal condition is calibration: e.g. when we express 70% confidence, we should be correct 70% of the time. But... Image LLM prompts can be very different. A model might be much more likely to hallucinate when asked for citations to the functional analysis literature vs. when asked for state capitals. Calibrated models can be systematically over-confident for one and under-confident for the other.
May 10, 2023 9 tweets 2 min read
To what extent can calibrated predictions be viewed as "real probabilities" --- somehow a measure of truth, rather than estimation, even when there is no underlying probabilistic process? I'll explain a simple but striking early result of Philip Dawid that isn't so well known. 🧵 Calibration asks that probability estimates be self-consistent: averaged over all of the days I claim it should rain 20% of the time, it should rain 20% of the time. Similarly for 30%, 40%, etc. On its own calibration is quite weak.
Oct 3, 2022 15 tweets 6 min read
Our new paper gives very simple algorithms that promise "multivalid" conformal prediction sets for exchangable data. This means they are valid not just marginally, but also conditionally on (intersecting!) group membership, and in a threshold calibrated manner. I'll explain! 🧵 Instead of making point predictions, we can quantify uncertainty by producing "prediction sets" --- sets of labels that contain the true label with (say) 90% probability. The problem is, in a k label prediction problem, there are 2^k prediction sets. The curse of dimensionality!
Jun 3, 2022 14 tweets 6 min read
Machine Learning is really good at making point predictions --- but it sometimes makes mistakes. How should we think about which predictions we should trust? In other words, what is the right way to think about the uncertainty of particular predictions? A thread about new work 🧵 First, some links. Here is our paper: arxiv.org/abs/2206.01067 Here is me giving a talk about it: simonsfoundation.org/event/robust-a… It’s joint work with Bastani, Gupta, @crispy_jung, Noarov, and Ramalingam. Our code will shortly be available on github, in the repository linked in the paper.
Dec 18, 2020 32 tweets 7 min read
Ok, I don't have 2020 favorite papers for 2020, but I did learn a couple of things this year. Here is a slow moving thread of the ideas I learned about this year. Many of these ideas are old, but they were new to me!🧵 1) Calibration via the minmax theorem. This is an old idea of Sergiu Hart's, that he originally communicated verbally to Foster and Vohra (it appears with credit in their classic 1998 paper). Sergiu wrote it up this year in this short note: ma.huji.ac.il/hart/papers/ca…
Dec 8, 2019 6 tweets 2 min read
A nice new fairness paper by Blum and Strangl: arxiv.org/pdf/1912.01094… They show that if there are two populations with the same base rate, but then data is biased either by undersampling positive examples from population B, or by corrupting positive labels in population B... 1/3 Then ERM subject to the constraint of equalizing true positive rates across groups recovers the optimal classifier on the original (unbiased) data distribution. Other fairness constraints (like also equalizing false positive rates, or asking for demographic parity) don't. 2/3
Nov 13, 2019 12 tweets 4 min read
Heading to Facebook today for a fireside chat with @SolomonMg about The Ethical Algorithm. In preparing I was looking into re-identification attacks against production systems that purport to protect privacy, and read about Aircloak's system Diffix. A short thread. 1/ Diffix provides an interactive system by which users can query data, and returns answers that are perturbed with small amounts of noise. But despite the name, it doesn't promise differential privacy. They are proud of this! On their website they write that: 2/
May 8, 2019 11 tweets 3 min read
We wrote a paper proposing a new relaxation of differential privacy that has lots of nice properties: arxiv.org/abs/1905.02383 It's 85 pages long, so here is the TL;DR. Suppose S is a dataset with your data, and S' is the dataset with your data removed. 1/ Differential privacy can be viewed as promising that any hypothesis test aiming to distinguish whether I used S vs. S' in my computation that has false positive rate alpha must have a true positive rate of at most e^eps*alpha+delta. Cool - its easy to interpret this guarantee! 2/
Jan 22, 2019 11 tweets 4 min read
Michael Kearns and I wrote a book! Its called "The Ethical Algorithm: The Science of Socially Aware Algorithm Design", and its going to be published by Oxford University Press in the fall. Let me tell you about it! (Thread) First, its not a textbook: its a "trade book" -- a popular science book. Its intended readership isn't just computer science PhDs, but the educated public broadly. But there should be plenty in it to interest experts, because we cover quite a bit of ground.