Just tag Devin to fix frontend bugs, create first-draft PRs for backlog tasks, make refactors, and more.
Start building with Devin below:
1/5 Devin is built to collaborate with engineering teams and starts at $500/month. Here’s how some of the best teams are using Devin today:
Sep 12, 2024 • 5 tweets • 3 min read
We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1's reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code.
Linked below is a deep dive with more eval results and how we think about evaluating coding agents. Here’s a summary of o1’s strengths and weaknesses:
For this evaluation, we use a simplified version of Devin, called ”Devin-Base”, as the production version of Devin uses models post-trained on proprietary data. This allows us to specifically measure how changes in base models impact Devin’s capabilities.
In comparison to GPT-4o, we found that o1 has a striking ability to reflect and analyze. It will often backtrack and consider different options before arriving at the correct solution, and is less likely to hallucinate or be confidently incorrect. When using o1-preview, Devin is more likely to correctly diagnose root cause issues, rather than addressing the symptoms of a problem.
In the clip below, Devin encounters an unexpected error and needs to use its problem-solving abilities. It researches the internet like a human would and after a few steps finds a relevant GitHub issue for its problem.
May 6, 2024 • 8 tweets • 2 min read
We just co-hosted our first ever small hackathon for Devin. We gave all teams early access to Devin for 24 hours. Some of the projects: 1. d3n is an AI agent orchestration framework that can spawn a fleet of Devin instances to tackle distributed problems in parallel
Today we're excited to introduce Devin, the first AI software engineer.
Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.
Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.
When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.
Check out what Devin can do in the thread below.
1/4 Devin can learn how to use unfamiliar technologies.