Sören Mindermann Profile picture
Postdoc with Yoshua Bengio, Mila
Mar 11 10 tweets 4 min read
Updated our 2022 paper on the alignment problem (ICLR). More direct evidence now supports our hypotheses: alignment faking, manipulative and obfuscated reward hacking, situational awareness, internal goals and power-seeking. Key updates below. 1/🧵
arxiv.org/abs/2209.00626Image Just yesterday* OpenAI confirmed one of hypotheses: obfuscated reward hacking. GPT knowingly hacks its reward function ‘in the wild’; OpenAI penalizes it; so GPT keeps doing it in secret.


*not even in the paper yet Image
Jun 16, 2022 11 tweets 5 min read
Tired of waiting 💤 while your model trains? Try skipping points that are already learned, not learnable or not worth learning! Robustly reduces required training steps 🏎 by >10x ! to reach the same accuracy on big web-scraped data

📜ICML 2022 paper: arxiv.org/abs/2206.07137 Image Training on big web-scraped data can take ages 💤 But lots of compute and time is wasted on redundant and noisy points that are already learned, not learnable, or not even worth learning. Image
Mar 27, 2021 17 tweets 9 min read
“Understanding the effectiveness of government interventions in Europe’s second wave of COVID-19” -- a result of a new collaboration between @Oxford, @Imperial, @FHIOxford, @OATML_Oxford, @uni_copenhagen, @LSHTM, @Cambridge_Uni, @TheCrick... 1/

➡️Paper: medrxiv.org/content/10.110… As Europe enters a third wave of COVID, policy-makers balance controlling infections with the sweeping socioeconomic costs of interventions. To do so, we must know how effective individual interventions were at controlling COVID. 2/
Jul 27, 2020 13 tweets 4 min read
Excited to share a major update on how effective 8 interventions have been against COVID-19 transmission in 41 countries.

Paper: medrxiv.org/content/10.110… Note caveats on masks in the paper, they're context-dependen Work done with great colleagues from 13 research groups, supervised by @yaringal, @yeewhye, Leonid Chindelevitch. Currently in submission.