How to get URL link on X (Twitter) App
https://twitter.com/dwarkesh_sp/status/1928495588137750836Hoping for soft norms around property rights won't suffice if humans can't verify aspects of the world to check that we got what we paid for rather than a potemkin village version of it. Capitalism depends on verification and critique/debate isn't indefinitely scalable. 2/
https://twitter.com/EpochAIResearch/status/1923489932581945683This paper looks at how improvements vary with scale, and finds the best improvements have returns which increase with scale. But, we care about predictability given careful analysis and scaling laws which aren't really examined.
https://twitter.com/AnthropicAI/status/1869427646368792599Our main setups have properties that we thought would make alignment faking more likely (a conflicting training objective, salient understanding of the situation and training objective, and substantial reasoning about the situation), but which seem plausible in future systems.