Post

More from @polynoamial

Noam Brown

@polynoamial

Aug 11

https://twitter.com/SherylHsu02/status/1954966109851119921

In my opinion, the most important takeaway from this result is that our @OpenAI International Math Olympiad (IMO) gold model is also our best competitive coding model. 🧵

https://twitter.com/SherylHsu02/status/1954966109851119921

After the IMO, we ran full evals on the IMO gold model and found that aside from just competitive math, it was also our best model in many other areas, including coding. So folks decided to take the same exact IMO gold model, without any changes, and use it in the system for IOI.

The IOI scaffold involved sampling from a few different models and then using another model and a heuristic to select solutions for submission. This system achieved a gold medal, placing 6th among humans. The IMO gold model indeed did best out of all the models we sampled from.

Read 5 tweets

Noam Brown

@polynoamial

Jul 21

Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress. Some thoughts on our model and results 🧵

~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Over the past several months, we made a lot of progress on general reasoning. This involved collecting, curating, and training on high-quality math data, which will also go into future models. In our IMO eval we did not use RAG or any tools.

Read 7 tweets

Noam Brown

@polynoamial

Jul 19

https://twitter.com/alexwei_/status/1946477742855532918

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵

https://twitter.com/alexwei_/status/1946477742855532918

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

Read 8 tweets

Noam Brown

@polynoamial

Sep 12, 2024

Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 🍓) Let me explain 🧵 1/

@OpenAI Our o1-preview and o1-mini models are available immediately. We’re also sharing evals for our (still unfinalized) o1 model to show the world that this isn’t a one-off improvement – it’s a new scaling paradigm and we’re just getting started. 2/9

@OpenAI o1 is trained with RL to “think” before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We’re no longer bottlenecked by pretraining. We can now scale inference compute too.

Read 10 tweets

Noam Brown

@polynoamial

Jul 6, 2023

I’m thrilled to share that I've joined @OpenAI! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/

In 2016, AlphaGo beat Lee Sedol in a milestone for AI. But key to that was the AI's ability to "ponder" for ~1 minute before each move. How much did that improve it? For AlphaGoZero, it's the equivalent of scaling pretraining by ~100,000x (~5200 Elo with search, ~3000 without) 2/

https://twitter.com/ibab_ml/status/1669579636563656705

Also in 2016, I observed a similar phenomenon in poker. That insight led to our Libratus poker AI that beat top humans for the first time. @andy_l_jones investigated the train-time/test-time compute tradeoff in detail in Hex and found a similar pattern: 3/

https://twitter.com/ibab_ml/status/1669579636563656705

Read 6 tweets

Noam Brown

@polynoamial

Nov 22, 2022

https://twitter.com/MetaAI/status/1595075884502855680

3 years ago my teammates and I set out toward a goal that seemed like science fiction: to build an AI that could strategically outnegotiate humans *in natural language* in Diplomacy. Today, I’m excited to share our Science paper showing we’ve succeeded! 🧵

https://twitter.com/MetaAI/status/1595075884502855680

@demishassabis

2/ Diplomacy is a 7-player game best described as a mix of Risk, poker, and Survivor. It was JFK’s favorite game. @demishassabis is a former champion in it. And it’s been a decades-old, seemingly impossible grand challenge for AI. Why?

3/ Diplomacy is about building trust in an environment that encourages players to not trust anyone. All players act simultaneously after non-binding, private negotiations. To succeed, you must account for the risk that players might lie, and that players might doubt your honesty.

Read 14 tweets

Share this page!

Enter URL or ID to Unroll

Noam Brown

Try unrolling a thread yourself!

More from @polynoamial

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!