Latest Twitter Threads by @yonashav on Thread Reader App

Mar 10 • 6 tweets • 2 min read

These results are a massive deal, and overhauled the way I think about alignment and misalignment.
I think this suggests a new default alignment strategy.
Results and takeaways 🧵

https://twitter.com/openai/status/1899143752918409338

For current capability levels,
1️⃣ Complex reward hacking already happens in practice in frontier training runs, and the models get extremely creative with their hacks. (I’m glad we’re increasing transparency on this, and hope others follow suit.)

Dec 22, 2024 • 7 tweets • 5 min read

Now that everyone knows about o3, and imminent AGI is considered plausible, I’d like to walk through some of the AI policy implications I see. These are my own takes and in no way reflective of my employer. They might be wrong! I know smart people who disagree. They don’t require you to share my timelines, and are intentionally unrelated to the previous AI-safety culture wars.

Nov 24, 2023 • 5 tweets • 1 min read

If you are a public figure and tell your followers that “big new risks from advanced AI are fake”, you are wrong.
Not only that, you’ll be seen to be wrong *publicly & soon*.
This is not an “EA thing”, it is an oncoming train and it is going to hit you, either help out or shut up We are headed for >=1 of:
* Massive job loss & weakening of labor
* Massive cost-cuts to totalitarianism
* Autonomous agents reshaping the [cyber/information] env
* Major acceleration of R&D
* AI systems we cannot trust with power, but are caught in a prisoner’s dilemma to deploy

Jul 5, 2023 • 16 tweets • 5 min read

The data used to train an AI model is vital to understanding its capabilities and risks. But how can we tell whether a model W actually resulted from a dataset D?
In a new paper, we show how to verify models' training-data, incl the data of open-source LMs!arxiv.org/abs/2307.00682 Today, users and auditors have to blindly trust AI developers about the training data used.
But shady devs have incentive to lie, to sell you a weaker model or dodge auditor/regulatory scrutiny.
If auditors could catch any lies about data, we’d start trusting AI audits much more.

Mar 24, 2023 • 32 tweets • 6 min read

If AI models get even better, their unchecked use will begin to pose serious dangers to society.

Most people agree it’d be great if countries could agree on rules to prevent AI misuse/accidents, & avoid an arms race.

But how could rules on AI actually be enforced?
Paper thread: I wrote a paper on my current best guess for how to do it.

That said, verifying compliance internationally is a messy business. The devil is in the details, and the details aren’t on Twitter.
See the paper for all my caveats!

Ok, why are AI laws hard?
arxiv.org/abs/2303.11341

Aug 7, 2022 • 8 tweets • 2 min read

A huge problem in with coalition-building in AI policy is that policymakers are trained to think on 10 year horizons (because policy is slow), while researchers are trained to think on 3 year horizons (because they need to assess whether their paper will be liked/get scooped) When policymakers ask the research community “will large AI models be more dangerous than tanks”, researchers will try and predict what PaLM+3y could probably actually do, and then publicly say “no that’s an overreaction”.

But that’s answering a different question!

Share this page!

Enter URL or ID to Unroll