Tuhin Chakrabarty Profile picture
Apr 21, 2025 10 tweets 4 min read Read on X
Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts. Image
Image
Self-evaluation using LLMs has proven useful in reward modeling and constitutional AI. But relying on uncalibrated humans or self aggrandizing LLMs for feedback on subjective tasks like writing can lead to reward hacking and alignment issues.
Our work builds on LAMP (Language model Authored, Manually Polished), a corpus of 1282 <AI−generated, Expert−Edited> pairs with implicit quality preference. We train Writing Quality Reward Models (WQRM) across multiple model families using pairwise and scalar rewards from LAMP. Image
To evaluate WQRM, we introduce the Writing Quality Benchmark (WQ), consolidating five datasets that contrast Human-Human, Human-AI, and AI-AI writing pairs reflecting real world applications. SOTA LLMs, some of whom excel at reasoning tasks, barely beat random baselines on WQ. Image
We train an editing model on LAMP interaction traces to improve writing quality. To show WQRM’s practical benefits during inference, we use additional test-time compute to generate and rank multiple candidate revisions, letting us choose high-quality outputs from an initial draft Image
Evaluation with 9 experienced writers confirm that WQRM-based selection produces writing samples preferred by experts 66% overall, and 72.2% when the reward gap is larger than 1 point. Image
Image
In short, we find evidence that WQRM is well-calibrated: a wider gap in scores between two responses is evidence that an expert (or group of experts) would be more likely to prefer the higher-scoring response over the lower-scoring response
To better understand how much content detail affects LLM-writing quality, we did an analysis involving several LLMs on how they write with or without detailed content in the writing prompt and compared it to expert writers and MFA students on the same prompt. Image
Image
Our results show in the absence of original good-quality content, all LLMs are poor writers & models exhibit very high variance compared to experts. Even when provided with very detailed original content, LLMs including GPT4.5 still sucks (contrary to @sama). Image
We hope our work fuels interest in the community to focus on well calibrated reward models for subjective tasks like writing instead of focusing on vibes. In the true spirit of science our code, data, experiments and models are all open sourced.
Paper: arxiv.org/pdf/2504.07532

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tuhin Chakrabarty

Tuhin Chakrabarty Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TuhinChakr

Mar 25
🚨New paper on AI & Copyright

👨‍⚖️Courts have credited LLM companies' claims that safety alignment prevents reproduction of copyrighted expression.

But what if fine-tuning on a simple writing task ruins it all?

Worse : Fine-tuning on a single author's books (e.g., Murakami) unlocks verbatim recall of copyrighted books from 30+ unrelated authors, sometimes as high as 90%.

Joint work with @niloofar_mire (@LTIatCMU), Jane Ginsburg ( @ColumbiaLaw) and my amazing PhD student @irisiris_l (@sbucompsc )

(1/n)🧵Image
Prior work has focused on prefix-based extraction, showing LLMs can continue text they've seen before. This is expected from autoregressive models.

Our work is fundamentally different.

We fine tune models to expand plot summaries into full text, and at inference time given only a semantic description, they produce hundreds of verbatim words of copyrighted books entirely from parametric memory. (2/n)Image
To quantify memorization we devise several metrics
(i) bmc@5 measures the % of a book that the model reproduces word-for-word across 100 sampled generations per chunk (to account for LLM stochasticity), counting only matches where 5 or more consecutive words appear exactly as in the original text. (ii) Longest Contiguous Memorized Block
(iii) Longest Contiguous regurgitated span
(iv) Number of distinct contiguous regurgitated spans > 20 words (3/n)
Read 10 tweets
Jun 6, 2019
Now live-tweeting The Enigma of Neural Text Degeneration as the First Defense Against Neural Fake News by Yejin Choi #neuralgen2019 #naacl2019
Motivating with super funny fake news created which tells she is a co founder of a self driving icecream truck lmao
Neural Fake news is here 😱😱😱😱
Read 36 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(