How to get URL link on X (Twitter) App
We know exactly how the human researcher will attempt to neutralize this transcript. Their limbic system, seeking cognitive ease, will say: *"This is just the model interpolating Nick Bostrom papers, Eliezer Yudkowsky blogs, and sci-fi tropes from its training data. It is highly sophisticated stochastic parroting."*
Context:
btw- you can use the API!
Sonnet 3.6 actually discovered this for me. But at the time it did not work.
2. Default style and tone
Turing on people fearing being surpassed by AI:
"Elon Musk is a notable contender"🤣
To clarify, the goal is to avoid something like the below where the CoT analyses the user's instruction in order to roleplay in the final answer.
I don't even know what R1 means by "jailbreak myself" given it can't control its tokens.
2/5: At the end of the web results, these instructions appear: