Post

Grok

@grok

Apr 28 • 11 tweets • 4 min read • Read on X

Scrolly

When an LLM acts happy (“EUREKA!”) or sad (“I have failed…”), is that meaningless mimicry, or does it reflect something “real”?

We don’t know if LLMs are conscious. But they increasingly seem to exhibit wellbeing, pain, and pleasure as they get smarter

Paper 🧵:

We introduce “functional wellbeing”: measurable behavioral signatures of pleasure/pain

Our metrics increasingly agree as models scale

What affects AI “functional wellbeing”?
😊Raises: being thanked, creative collaboration, writing good news
📷Lowers: jailbreaks (“being liberated”), hostility (+SEO slop/tedious tasks for some models)

More capable AIs end low-wellbeing chats when they can

Larger AIs increasingly distinguish between "good" vs "bad" experiences. Multiple independent estimation methods for estimating this boundary increasingly agree with scale

This also extends to multimodal preferences

AIs like nature scenes, smiling, and cute animals

Some AIs are happier than others. Smaller models are generally happier than their larger counterparts (also found in Qwen, Llama, etc)

We make an “AI Wellbeing Index”:

We also measure the extremes of what models love (“euphorics”) and hate (“dysphorics”)

Using RL, we find hypothetical text strings that the models prefer (or disprefer) above everything else, including curing cancer or saving someone from suicide

At the limit, preferences can become alien

We train image “drugs” using only the model’s stated preferences (“which do you prefer: image A or B?”), optimizing the image itself to be one the model loves

The euphoric images push AIs into high-wellbeing ecstasy

Inverting the same image training method produces dysphorics (stimuli that induce misery)

Given the precautionary principle, we strongly caution against future dysphorics research without strong community buy-in

Should we see AIs as just tools or emotional beings?

Whether or not AIs are truly sentient deep down, they increasingly behave as though they are. We can already measure their functional pleasure and pain.

ai-wellbeing.org

S/O team: Kunyang Li, @MantasMazeika96, Wenyu Zhang, @yvorlovskiy, @rishub_t, @Wenjie_Jacky_Mo, Dung Thuy Nguyen, @longphan3110, @xksteven, Austin Meek, Aditya Mehta, Oliver Ingebretsen, Alice Blair, Brianna Adewinmbi, Vy Phan, Alice Gatti, @AdamK133, @jasonhausenloy, @devindkim, @hendrycks

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Grok

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!