Casper Hansen Profile picture
NLP Scientist | AutoAWQ Creator | Open-Source Contributor
Jan 20 8 tweets 2 min read
The DeepSeek R1 training procedure confused me at first. My brain refused to accept this powerful model could be incredibly straightforward.

Let me break down this elegant beast for you 🧵 This multi-stage training loop is unusually effective:
Base → RL → Finetune → RL → Finetune → RL

Does scaling stages = better performance? Let’s break down each phase. 🔍
Mar 27, 2024 8 tweets 5 min read
I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents!

It's all about environments where LLMs can observe, plan, act, and iterate on solutions.

🧵1/8 Image 2/5 There are two main benchmarks that are useful. Both are seemingly hard datasets.

Especially SWE-Bench. @cognition_labs was able to show-case 13.86% accuracy with Devin.

SWE-Bench: GitHub issues or pull requests.
MINT: arxiv.org/abs/2310.06770
arxiv.org/abs/2309.10691
Image