Noam Brown Profile picture
Researching reasoning @OpenAI | Co-created Libratus/Pluribus, the first superhuman no-limit poker AIs | Co-created CICERO | PhD from @SCSatCMU
Jeremy Wagstaff Profile picture Potato Of Reason Profile picture Jerome Ku Profile picture Santiago Cavanna Profile picture 4 subscribed
Jul 6, 2023 6 tweets 2 min read
I’m thrilled to share that I've joined @OpenAI! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/ In 2016, AlphaGo beat Lee Sedol in a milestone for AI. But key to that was the AI's ability to "ponder" for ~1 minute before each move. How much did that improve it? For AlphaGoZero, it's the equivalent of scaling pretraining by ~100,000x (~5200 Elo with search, ~3000 without) 2/ Image
Nov 22, 2022 14 tweets 9 min read
3 years ago my teammates and I set out toward a goal that seemed like science fiction: to build an AI that could strategically outnegotiate humans *in natural language* in Diplomacy. Today, I’m excited to share our Science paper showing we’ve succeeded! 🧵 2/ Diplomacy is a 7-player game best described as a mix of Risk, poker, and Survivor. It was JFK’s favorite game. @demishassabis is a former champion in it. And it’s been a decades-old, seemingly impossible grand challenge for AI. Why?
Oct 12, 2022 9 tweets 5 min read
After building on years of work from MILA, DeepMind, ourselves, and others, our AIs are now expert-human-level in no-press Diplomacy and Hanabi! Unlike Go and Dota, Diplomacy/Hanabi involve *cooperation*, which breaks naive RL. arxiv.org/abs/2210.05492 arxiv.org/abs/2210.05125 🧵👇 In two-player zero-sum games like Go/Poker/Dota, principled self-play RL converges to a perfect strategy. A scalable algorithm with enough capacity/compute is all you need. But self-play RL alone may not play well with humans in *cooperative* games, even with *infinite* compute.