Latest Twitter Threads by @Aaroth on Thread Reader App

Sep 19 • 11 tweets • 4 min read

Aligning an AI with human preferences might be hard. But there is more than one AI out there, and users can choose which to use. Can we get the benefits of a fully aligned AI without solving the alignment problem? In a new paper we study a setting in which the answer is yes.

Imagine predictive medicine LLMs being used by a doctor to help treat patients. One is made by Merck. The doctor wants to cure his patients as quickly as possible. The Merck LLM also tries to do this, but has a preference for Merck drugs, resulting in substantial misalignment.

Apr 9, 2024 • 6 tweets • 2 min read

We'd like LLM outputs to be accompanied by confidence scores, indicating the confidence we should have in them. But what semantics should a confidence score have? A minimal condition is calibration: e.g. when we express 70% confidence, we should be correct 70% of the time. But...

LLM prompts can be very different. A model might be much more likely to hallucinate when asked for citations to the functional analysis literature vs. when asked for state capitals. Calibrated models can be systematically over-confident for one and under-confident for the other.

May 10, 2023 • 9 tweets • 2 min read

To what extent can calibrated predictions be viewed as "real probabilities" --- somehow a measure of truth, rather than estimation, even when there is no underlying probabilistic process? I'll explain a simple but striking early result of Philip Dawid that isn't so well known. 🧵 Calibration asks that probability estimates be self-consistent: averaged over all of the days I claim it should rain 20% of the time, it should rain 20% of the time. Similarly for 30%, 40%, etc. On its own calibration is quite weak.

Oct 3, 2022 • 15 tweets • 6 min read

Our new paper gives very simple algorithms that promise "multivalid" conformal prediction sets for exchangable data. This means they are valid not just marginally, but also conditionally on (intersecting!) group membership, and in a threshold calibrated manner. I'll explain! 🧵

Instead of making point predictions, we can quantify uncertainty by producing "prediction sets" --- sets of labels that contain the true label with (say) 90% probability. The problem is, in a k label prediction problem, there are 2^k prediction sets. The curse of dimensionality!

Jun 3, 2022 • 14 tweets • 6 min read

Machine Learning is really good at making point predictions --- but it sometimes makes mistakes. How should we think about which predictions we should trust? In other words, what is the right way to think about the uncertainty of particular predictions? A thread about new work 🧵

First, some links. Here is our paper: arxiv.org/abs/2206.01067 Here is me giving a talk about it: simonsfoundation.org/event/robust-a… It’s joint work with Bastani, Gupta, @crispy_jung, Noarov, and Ramalingam. Our code will shortly be available on github, in the repository linked in the paper.

Dec 18, 2020 • 32 tweets • 7 min read

Ok, I don't have 2020 favorite papers for 2020, but I did learn a couple of things this year. Here is a slow moving thread of the ideas I learned about this year. Many of these ideas are old, but they were new to me!🧵 1) Calibration via the minmax theorem. This is an old idea of Sergiu Hart's, that he originally communicated verbally to Foster and Vohra (it appears with credit in their classic 1998 paper). Sergiu wrote it up this year in this short note: ma.huji.ac.il/hart/papers/ca…

Dec 8, 2019 • 6 tweets • 2 min read

A nice new fairness paper by Blum and Strangl: arxiv.org/pdf/1912.01094… They show that if there are two populations with the same base rate, but then data is biased either by undersampling positive examples from population B, or by corrupting positive labels in population B... 1/3 Then ERM subject to the constraint of equalizing true positive rates across groups recovers the optimal classifier on the original (unbiased) data distribution. Other fairness constraints (like also equalizing false positive rates, or asking for demographic parity) don't. 2/3

Nov 13, 2019 • 12 tweets • 4 min read

Heading to Facebook today for a fireside chat with @SolomonMg about The Ethical Algorithm. In preparing I was looking into re-identification attacks against production systems that purport to protect privacy, and read about Aircloak's system Diffix. A short thread. 1/ Diffix provides an interactive system by which users can query data, and returns answers that are perturbed with small amounts of noise. But despite the name, it doesn't promise differential privacy. They are proud of this! On their website they write that: 2/

May 8, 2019 • 11 tweets • 3 min read

We wrote a paper proposing a new relaxation of differential privacy that has lots of nice properties: arxiv.org/abs/1905.02383 It's 85 pages long, so here is the TL;DR. Suppose S is a dataset with your data, and S' is the dataset with your data removed. 1/ Differential privacy can be viewed as promising that any hypothesis test aiming to distinguish whether I used S vs. S' in my computation that has false positive rate alpha must have a true positive rate of at most e^eps*alpha+delta. Cool - its easy to interpret this guarantee! 2/

Jan 22, 2019 • 11 tweets • 4 min read

Michael Kearns and I wrote a book! Its called "The Ethical Algorithm: The Science of Socially Aware Algorithm Design", and its going to be published by Oxford University Press in the fall. Let me tell you about it! (Thread) First, its not a textbook: its a "trade book" -- a popular science book. Its intended readership isn't just computer science PhDs, but the educated public broadly. But there should be plenty in it to interest experts, because we cover quite a bit of ground.

Share this page!

Enter URL or ID to Unroll