How to get URL link on X (Twitter) App
How can we get explanations out of LMs? One easy way is to ask LMs to explain their decisions directly, or monitor their CoTs.
https://x.com/OpenAI/status/1912549344978645199
We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini!