Latest Twitter Threads by @sytelus on Thread Reader App

Feb 10 • 8 tweets • 2 min read

So, AIME might not be a good test for frontier models after all.

For 15 problems in AIME 2025 Part 1, I fired off deep research to find near duplicates.

It turns out… 1/n🧵 * 8 out of 15 problems already existed on Internet as near-duplicate.

* 5 problems are simple application of less known theorems/formulas.

* 2 problems needed creative composing of multiple theorems/formulas.

Oct 21, 2024 • 8 tweets • 2 min read

DeepMind's chess paper has sharply divided AI community:

Some are pointing to it as evidence that LLMs can do reasoning and planning while others say it's just lookup table/memorization.

In reality, I think the paper uncovers something else if you look into details! 🧵

First, the authors are trying to distill Stockfish engine into a model. One would think student wouldn't do better than the teacher but the teacher Elo is 2713 while student gets 2895.

So student is doing better than teacher!

But this is still not the most interesting part!!

Oct 13, 2024 • 9 tweets • 2 min read

There is something beautiful and amazing happening right now. A new optimizer that is almost twice as efficient than AdamW is being born right in front of our eyes!

Many people are probably rolling their eyes just reading this but this time it feels different! 🧵

We have a mini cottage industry which has tradition for putting out papers claiming to beat Adam annually that just never seems to pan out. There seems to be always some hidden catch, if things are reproducible at all. So, rightfully, many have became numb to these announcements.

Aug 6, 2023 • 9 tweets • 3 min read

Just learned something very cool about LR schedules. This one is so huge it surprises me that it's not in its own paper but rather tucked away.

Problem: Most training use cosine/linear decays but this requires specifying number of steps in advance. This is quite troublesome. 🧵

https://twitter.com/sherjilozair/status/1687837844729966592

For example, LLAMA2 trains for 500k steps but when you look at training curves, it is obvious that you could have kept going except that now you can't because it's too late. Repeating entire run with new larger steps is too expensive.

Mar 19, 2023 • 7 tweets • 2 min read

While folks in academia are freaking out about not having as much compute as OpenAI and depressed about their research agenda, it might be easy to miss some critical trends. My bet is that both “L” in LLM will be gone in about 2 years. 🧵 Before GPT-4 came out, many people were still quite smug about scaling. One of the biggest thing in GPT-4 paper is the verification that one can precisely predict performance on actual benchmarks like HumanEval with scale. This is huge. Scaling is real and here to stay but…

Dec 6, 2022 • 25 tweets • 6 min read

There are several things that ChatGPT fails spectacularly (for ex, 3 digit multiplication, ASCII art). These fails gives a lot of insights in inner workings but what is far more astonishing is things that it succeeds unbelievably. Here's some of my favorites: 🧵

https://twitter.com/typesfaster/status/1599893605409234953?s=20&t=ESFpqpj7Yxtnn-Hxz67_QQ

Dec 2, 2022 • 25 tweets • 6 min read

ChatGPT was dropped on us just bit over 24 hours. It's like you wake up to the news of first nuclear explosion and you don't know yet what to think about it but you know world will never be the same again. Here some interesting snapshots of this "explosion"🧵:

https://twitter.com/EladRichardson/status/1598333315764871174?s=20&t=t-fGx5BVgkp3EbmMbxNw2Q

Share this page!

Enter URL or ID to Unroll