Here is Claude 3's system prompt!
Let me break it down 🧵
To begin with, why do we use system prompts at all? First, they let us give the model ‘live’ information like the date. Second, they let us do a little bit of customizing after training and to tweak behaviors until the next finetune. This system prompt does both.
The first part is fairly self-explanatory. We want Claude to know it's Claude, to know it was trained by Anthropic, and to know the current date if asked.
2. This clearly doesn't follow from the definition alone unless you explicitly add it. (Constraint: never interact with the system and immediately destroy it.) Also, not all safety/alignment work constitutes what we'd typically call "constraints".
3. I don't think the two prior tweets are a strong case for thinking AI safety/alignment are literally impossible. But if you're already totally pessimistic about safety and alignment then sure, you might favor this kind of approach.