Latest Twitter Threads by @natashajaques on Thread Reader App

Mar 20 • 6 tweets • 4 min read

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content.

We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

This is a problem, because LLM-generated text is already infiltrating a lot of our cultural and scientific institutions. For example, we look at the 21% of ICLR 2026 reviews that were found to be LLM-generated, and find that they actually focus on different scientific criteria than human reviews! e.g. LLMs increase focus on scalability by +111% vs. humans.

Dec 6, 2020 • 7 tweets • 4 min read

How can we move deep RL beyond games, without having to hand-build a simulator that covers real-world complexity? We adversarially generate a curriculum of challenging yet feasible environments by maximizing regret between a pair of agents, with PAIRED... Which is joint work with @MichaelD1729, @EugeneVinitsky, @alexandrebayen, Stuart Russell, Andrew Critch, and @svlevine, and will be presented as an oral at NeurIPS on Monday, December 7th at 6:30pm PT, with poster from 9-11pm PT neurips.cc/virtual/2020/p…

Jul 2, 2019 • 4 tweets • 3 min read

Excited to release our latest paper, which uses KL-control for effective off-policy RL, even when you can't explore online in the environment! We use this + neural.chat to learn from human conversation...

Paper arxiv.org/abs/1907.00456
Code github.com/natashamjaques…

2) ...by learning from cues like sentiment and conversation length that are implicit in the text itself. We show this is more effective than relying on explicit labeling of human preferences.

Share this page!

Enter URL or ID to Unroll