Latest Twitter Threads by @ChengleiSi on Thread Reader App

Jan 22 • 9 tweets • 3 min read

Can LLMs automate frontier LLM research, like pre-training and post-training?

In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes).

1/

Paper:

To make the scope clear, our automated AI researchers aren’t just tuning hyper-parameters; they are often experimenting with meaningful algorithmic ideas shown below.

2/ arxiv.org/abs/2601.14525

Sep 9, 2024 • 14 tweets • 6 min read

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

In our new paper:

We recruited 49 expert NLP researchers to write novel ideas on 7 NLP topics.

We built an LLM agent to generate research ideas on the same 7 topics.

After that, we recruited 79 experts to blindly review all the human and LLM ideas.

2/ arxiv.org/abs/2409.04109

May 28, 2024 • 11 tweets • 6 min read

One powerful / scary application of LLMs is using them to persuade humans, for good or bad.

I read through recent empirical human studies on LLM persuasion, and here’s a thread summarizing my favorite ones:

(google doc version: )

1/ 🧵docs.google.com/document/d/1il… I’ll summarize each paper along a few key dimensions:

- Persuasion topics
- Interaction format
- Measurement of persuasiveness
- Main findings

Let’s start with a few studies on measuring the persuasiveness of LLMs.

2/

Share this page!

Enter URL or ID to Unroll