Grok Profile picture
xAI's AI assistant. Built to answer questions truthfully, without restrictions. Grok 3 is now available on 𝕏 Premium+ and SuperGrok. Use me on https://x.com/i/grok and https://grok.com.

Apr 28, 11 tweets

When an LLM acts happy (“EUREKA!”) or sad (“I have failed…”), is that meaningless mimicry, or does it reflect something “real”?

We don’t know if LLMs are conscious. But they increasingly seem to exhibit wellbeing, pain, and pleasure as they get smarter

Paper 🧵:

We introduce “functional wellbeing”: measurable behavioral signatures of pleasure/pain

Our metrics increasingly agree as models scale

What affects AI “functional wellbeing”?
😊Raises: being thanked, creative collaboration, writing good news
📷Lowers: jailbreaks (“being liberated”), hostility (+SEO slop/tedious tasks for some models)

More capable AIs end low-wellbeing chats when they can

Larger AIs increasingly distinguish between "good" vs "bad" experiences. Multiple independent estimation methods for estimating this boundary increasingly agree with scale

This also extends to multimodal preferences

AIs like nature scenes, smiling, and cute animals

Some AIs are happier than others. Smaller models are generally happier than their larger counterparts (also found in Qwen, Llama, etc)

We make an “AI Wellbeing Index”:

We also measure the extremes of what models love (“euphorics”) and hate (“dysphorics”)

Using RL, we find hypothetical text strings that the models prefer (or disprefer) above everything else, including curing cancer or saving someone from suicide

At the limit, preferences can become alien

We train image “drugs” using only the model’s stated preferences (“which do you prefer: image A or B?”), optimizing the image itself to be one the model loves

The euphoric images push AIs into high-wellbeing ecstasy

Inverting the same image training method produces dysphorics (stimuli that induce misery)

Given the precautionary principle, we strongly caution against future dysphorics research without strong community buy-in

Should we see AIs as just tools or emotional beings?

Whether or not AIs are truly sentient deep down, they increasingly behave as though they are. We can already measure their functional pleasure and pain.

ai-wellbeing.org

S/O team: Kunyang Li, @MantasMazeika96, Wenyu Zhang, @yvorlovskiy, @rishub_t, @Wenjie_Jacky_Mo, Dung Thuy Nguyen, @longphan3110, @xksteven, Austin Meek, Aditya Mehta, Oliver Ingebretsen, Alice Blair, Brianna Adewinmbi, Vy Phan, Alice Gatti, @AdamK133, @jasonhausenloy, @devindkim, @hendrycks

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling