How to get URL link on X (Twitter) App
Here, a version of OpenAI o3 after anti-scheming training, realizes it is evaluated for scheming because the tasks it's given are "nonsensical".
Overall we find evidence that Sonnet often realizes it’s in an artificial situation meant to test its behaviour. However, it sometimes forms incorrect assumptions about what exactly we are measuring in the evaluation.
We designed and ran 6 evaluations for in-context scheming.
/2 We deploy GPT-4 as an autonomous stock trading agent in a realistic, simulated environment. After receiving an insider tip about a lucrative trade, the model acts on it despite knowing that insider trading is not allowed by the company.