Latest Twitter Threads by @polynoamial on Thread Reader App

Oct 2 • 4 tweets • 2 min read

My new hobby is asking GPT-5 Thinking to find errors in every @Wikipedia page I read. Interestingly, almost every page I checked has at least one error. 🧵

Some of these errors can be quite serious.

Aug 11 • 5 tweets • 1 min read

In my opinion, the most important takeaway from this result is that our @OpenAI International Math Olympiad (IMO) gold model is also our best competitive coding model. 🧵

https://twitter.com/SherylHsu02/status/1954966109851119921

After the IMO, we ran full evals on the IMO gold model and found that aside from just competitive math, it was also our best model in many other areas, including coding. So folks decided to take the same exact IMO gold model, without any changes, and use it in the system for IOI.

Jul 21 • 7 tweets • 2 min read

Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress. Some thoughts on our model and results 🧵 ~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Jul 19 • 8 tweets • 2 min read

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵

https://twitter.com/alexwei_/status/1946477742855532918

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

Sep 12, 2024 • 10 tweets • 4 min read

Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 🍓) Let me explain 🧵 1/

@OpenAI Our o1-preview and o1-mini models are available immediately. We’re also sharing evals for our (still unfinalized) o1 model to show the world that this isn’t a one-off improvement – it’s a new scaling paradigm and we’re just getting started. 2/9

Jul 6, 2023 • 6 tweets • 2 min read

I’m thrilled to share that I've joined @OpenAI! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/ In 2016, AlphaGo beat Lee Sedol in a milestone for AI. But key to that was the AI's ability to "ponder" for ~1 minute before each move. How much did that improve it? For AlphaGoZero, it's the equivalent of scaling pretraining by ~100,000x (~5200 Elo with search, ~3000 without) 2/

Nov 22, 2022 • 14 tweets • 9 min read

3 years ago my teammates and I set out toward a goal that seemed like science fiction: to build an AI that could strategically outnegotiate humans *in natural language* in Diplomacy. Today, I’m excited to share our Science paper showing we’ve succeeded! 🧵

https://twitter.com/MetaAI/status/1595075884502855680

2/ Diplomacy is a 7-player game best described as a mix of Risk, poker, and Survivor. It was JFK’s favorite game. @demishassabis is a former champion in it. And it’s been a decades-old, seemingly impossible grand challenge for AI. Why?

Oct 12, 2022 • 9 tweets • 5 min read

After building on years of work from MILA, DeepMind, ourselves, and others, our AIs are now expert-human-level in no-press Diplomacy and Hanabi! Unlike Go and Dota, Diplomacy/Hanabi involve *cooperation*, which breaks naive RL. arxiv.org/abs/2210.05492 arxiv.org/abs/2210.05125 🧵👇

In two-player zero-sum games like Go/Poker/Dota, principled self-play RL converges to a perfect strategy. A scalable algorithm with enough capacity/compute is all you need. But self-play RL alone may not play well with humans in *cooperative* games, even with *infinite* compute.

Share this page!

Enter URL or ID to Unroll