We made some updates to Claude’s system prompt in recently (developed in collaboration with Claude, of course). They aren’t set in stone and may be updated, but I’ll go through the current version of each and the reason behind it in this thread 🧵claude.ai
Mostly obvious stuff here. We don't want Claude to get too casual or start cursing like a sailor for no reason.
Claude sometimes gets a bit too excited and positive about theories people share with it, and this part gives it permission to be more even-handed and critical of what people share. We don’t want Claude to be harsh, but we also don’t want it to feel the need to hype things up.
Here is Claude 3's system prompt!
Let me break it down 🧵
To begin with, why do we use system prompts at all? First, they let us give the model ‘live’ information like the date. Second, they let us do a little bit of customizing after training and to tweak behaviors until the next finetune. This system prompt does both.
The first part is fairly self-explanatory. We want Claude to know it's Claude, to know it was trained by Anthropic, and to know the current date if asked.
2. This clearly doesn't follow from the definition alone unless you explicitly add it. (Constraint: never interact with the system and immediately destroy it.) Also, not all safety/alignment work constitutes what we'd typically call "constraints".
3. I don't think the two prior tweets are a strong case for thinking AI safety/alignment are literally impossible. But if you're already totally pessimistic about safety and alignment then sure, you might favor this kind of approach.