Thread Reader
Share this page!
×
Post
Share
Email
Enter URL or ID to Unroll
×
Unroll Thread
You can paste full URL like: https://x.com/threadreaderapp/status/1644127596119195649
or just the ID like: 1644127596119195649
How to get URL link on X (Twitter) App
On the Twitter thread, click on
or
icon on the bottom
Click again on
or
Share Via icon
Click on
Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at
Twitter Help
Casper Hansen
@casper_hansen_
NLP Scientist | AutoAWQ Creator | Open-Source Contributor
Subscribe
Save as PDF
Jan 20
•
8 tweets
•
2 min read
The DeepSeek R1 training procedure confused me at first. My brain refused to accept this powerful model could be incredibly straightforward.
Let me break down this elegant beast for you 🧵 This multi-stage training loop is unusually effective:
Base → RL → Finetune → RL → Finetune → RL
Does scaling stages = better performance? Let’s break down each phase. 🔍
Save as PDF
Mar 27, 2024
•
8 tweets
•
5 min read
I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents!
It's all about environments where LLMs can observe, plan, act, and iterate on solutions.
🧵1/8
2/5
There are two main benchmarks that are useful. Both are seemingly hard datasets.
Especially SWE-Bench. @cognition_labs was able to show-case 13.86% accuracy with Devin.
SWE-Bench: GitHub issues or pull requests.
MINT:
arxiv.org/abs/2310.06770
arxiv.org/abs/2309.10691