Shital Shah Profile picture
Mostly research and code. If universe is an optimizer, what is its loss function? All opinions are my own.
Feb 10 8 tweets 2 min read
So, AIME might not be a good test for frontier models after all.

For 15 problems in AIME 2025 Part 1, I fired off deep research to find near duplicates.

It turns out… 1/n🧵 * 8 out of 15 problems already existed on Internet as near-duplicate.

* 5 problems are simple application of less known theorems/formulas.

* 2 problems needed creative composing of multiple theorems/formulas.
Oct 21, 2024 8 tweets 2 min read
DeepMind's chess paper has sharply divided AI community:

Some are pointing to it as evidence that LLMs can do reasoning and planning while others say it's just lookup table/memorization.

In reality, I think the paper uncovers something else if you look into details! 🧵 Image First, the authors are trying to distill Stockfish engine into a model. One would think student wouldn't do better than the teacher but the teacher Elo is 2713 while student gets 2895.

So student is doing better than teacher!

But this is still not the most interesting part!!
Oct 13, 2024 9 tweets 2 min read
There is something beautiful and amazing happening right now. A new optimizer that is almost twice as efficient than AdamW is being born right in front of our eyes!

Many people are probably rolling their eyes just reading this but this time it feels different! 🧵 Image We have a mini cottage industry which has tradition for putting out papers claiming to beat Adam annually that just never seems to pan out. There seems to be always some hidden catch, if things are reproducible at all. So, rightfully, many have became numb to these announcements.
Aug 6, 2023 9 tweets 3 min read
Just learned something very cool about LR schedules. This one is so huge it surprises me that it's not in its own paper but rather tucked away.

Problem: Most training use cosine/linear decays but this requires specifying number of steps in advance. This is quite troublesome. 🧵 For example, LLAMA2 trains for 500k steps but when you look at training curves, it is obvious that you could have kept going except that now you can't because it's too late. Repeating entire run with new larger steps is too expensive. Image
Mar 19, 2023 7 tweets 2 min read
While folks in academia are freaking out about not having as much compute as OpenAI and depressed about their research agenda, it might be easy to miss some critical trends. My bet is that both “L” in LLM will be gone in about 2 years. 🧵 Before GPT-4 came out, many people were still quite smug about scaling. One of the biggest thing in GPT-4 paper is the verification that one can precisely predict performance on actual benchmarks like HumanEval with scale. This is huge. Scaling is real and here to stay but…
Dec 6, 2022 25 tweets 6 min read
There are several things that ChatGPT fails spectacularly (for ex, 3 digit multiplication, ASCII art). These fails gives a lot of insights in inner workings but what is far more astonishing is things that it succeeds unbelievably. Here's some of my favorites: 🧵
Dec 2, 2022 25 tweets 6 min read
ChatGPT was dropped on us just bit over 24 hours. It's like you wake up to the news of first nuclear explosion and you don't know yet what to think about it but you know world will never be the same again. Here some interesting snapshots of this "explosion"🧵: