Latest Twitter Threads by @sorenmind on Thread Reader App

Mar 11 • 10 tweets • 4 min read

Updated our 2022 paper on the alignment problem (ICLR). More direct evidence now supports our hypotheses: alignment faking, manipulative and obfuscated reward hacking, situational awareness, internal goals and power-seeking. Key updates below. 1/🧵
arxiv.org/abs/2209.00626

Just yesterday* OpenAI confirmed one of hypotheses: obfuscated reward hacking. GPT knowingly hacks its reward function ‘in the wild’; OpenAI penalizes it; so GPT keeps doing it in secret.

*not even in the paper yet

https://x.com/OpenAI/status/1899143752918409338

Jun 16, 2022 • 11 tweets • 5 min read

Tired of waiting 💤 while your model trains? Try skipping points that are already learned, not learnable or not worth learning! Robustly reduces required training steps 🏎 by >10x ! to reach the same accuracy on big web-scraped data

📜ICML 2022 paper: arxiv.org/abs/2206.07137

Training on big web-scraped data can take ages 💤 But lots of compute and time is wasted on redundant and noisy points that are already learned, not learnable, or not even worth learning.

Mar 27, 2021 • 17 tweets • 9 min read

“Understanding the effectiveness of government interventions in Europe’s second wave of COVID-19” -- a result of a new collaboration between @Oxford, @Imperial, @FHIOxford, @OATML_Oxford, @uni_copenhagen, @LSHTM, @Cambridge_Uni, @TheCrick... 1/

➡️Paper: medrxiv.org/content/10.110…

As Europe enters a third wave of COVID, policy-makers balance controlling infections with the sweeping socioeconomic costs of interventions. To do so, we must know how effective individual interventions were at controlling COVID. 2/

Jul 27, 2020 • 13 tweets • 4 min read

Excited to share a major update on how effective 8 interventions have been against COVID-19 transmission in 41 countries.

Paper: medrxiv.org/content/10.110…

Work done with great colleagues from 13 research groups, supervised by @yaringal, @yeewhye, Leonid Chindelevitch. Currently in submission.

Share this page!

Enter URL or ID to Unroll