PhDing @stanfordnlp | teaching language models to do research
Jan 22 • 9 tweets • 3 min read
Can LLMs automate frontier LLM research, like pre-training and post-training?
In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes).
1/
Paper:
To make the scope clear, our automated AI researchers aren’t just tuning hyper-parameters; they are often experimenting with meaningful algorithmic ideas shown below.
Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?
After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
In our new paper:
We recruited 49 expert NLP researchers to write novel ideas on 7 NLP topics.
We built an LLM agent to generate research ideas on the same 7 topics.
After that, we recruited 79 experts to blindly review all the human and LLM ideas.