Yesterday was my last day as head of alignment, superalignment lead, and executive @OpenAI.
It's been such a wild journey over the past ~3 years. My team launched the first ever RLHF LLM with InstructGPT, published the first scalable oversight on LLMs, pioneered automated interpretability and weak-to-strong generalization. More exciting stuff is coming out soon.
I love my team.
I'm so grateful for the many amazing people I got to work with, both inside and outside of the superalignment team.
OpenAI has so much exceptionally smart, kind, and effective talent.
Stepping away from this job has been one of the hardest things I have ever done, because we urgently need to figure out how to steer and control AI systems much smarter than us.
I joined because I thought OpenAI would be the best place in the world to do this research.
However, I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.
I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.
These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there.
Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.
Building smarter-than-human machines is an inherently dangerous endeavor.
OpenAI is shouldering an enormous responsibility on behalf of all of humanity.
But over the past years, safety culture and processes have taken a backseat to shiny products.
We are long overdue in getting incredibly serious about the implications of AGI.
We must prioritize preparing for them as best we can.
Only then can we ensure AGI benefits all of humanity.
OpenAI must become a safety-first AGI company.
To all OpenAI employees, I want to say:
Learn to feel the AGI.
Act with the gravitas appropriate for what you're building.
I believe you can "ship" the cultural change that's needed.
I am counting on you.
The world is counting on you.
:openai-heart:
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about.
There is still so much good research waiting to be funded.
Congrats to all recipients!
We just sent emails notifying all applicants. I'm sorry if your application didn't make it in, please don't let this dissuade you from pursuing research in alignment!
Some statistics on the superalignment fast grants:
We funded 50 out of ~2,700 applications, awarding a total of $9,895,000.
Median grant size: $150k
Average grant size: $198k
Smallest grant size: $50k
Largest grant size: $500k
I'm very excited that today OpenAI adopts its new preparedness framework!
This framework spells out our strategy for measuring and forecasting risks, and our commitments to stop deployment and development if safety mitigations are ever lagging behind.
This is a new direction for simultaneously getting two things that we want out of interpretability:
1. understanding the model at the level of detail of individual neurons
2. running over the entire model so we don't miss anything important
What's exciting is that it gives us a way to measure how good an neuron explanation is: we simulate how well a human would predict future firing pattern and compare this to actual firing patterns.
Right now this measure isn't that accurate, but it'll improve with better LLMs!
This is critical to scale learning from feedback since it enables humans to state their preferences even on very complex tasks that they otherwise don't understand well.