We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. openai.com/index/swe-lanc…
SWE-Lancer tasks span the full engineering stack, from UI/UX to systems design, and include a range of task types, from $50 bug fixes to $32,000 feature implementations. SWE-Lancer includes both independent engineering tasks and management tasks, where models choose between technical implementation proposals.
SWE-Lancer tasks more realistically capture the complexity of modern software engineering. Our tasks are full-stack and complex; the average task took freelancers over 21 days to resolve.
These improvements in capabilities can also be leveraged to improve safety. Today we’re releasing a paper on deliberative alignment that shares how we harnessed these advances to make our o1 and o3 models even safer to use. openai.com/index/delibera…