Cognition Profile picture
Makers of Devin, the first AI software engineer. We are an applied AI lab building end-to-end software agents. Join us: https://t.co/JZDd4Vik4P
May 6, 2025 6 tweets 3 min read
Our research interns present:
Kevin-32B = K(ernel D)evin

It's the first open model trained using RL for writing CUDA kernels. We implemented multi-turn RL using GRPO (based on QwQ-32B) on the KernelBench dataset.

It outperforms top reasoning models (o3 & o4-mini)! 🧵 Image We train on a subset of 180 PyTorch -> CUDA conversion tasks from KernelBench. It's a nice RL environment because we have immediate code execution feedback.

During training, we give the model 4 refinement steps. In each step, the model proposes a kernel. Then we evaluate correctness & performance and inject the environment feedback in the next step.

For more details on how we made GRPO work in a multi-turn setting read our blogpost (linked below)!Image
Apr 25, 2025 5 tweets 3 min read
Project DeepWiki

Up-to-date documentation you can talk to, for every repo in the world.

Think Deep Research for GitHub – powered by Devin.

It’s free for open-source, no sign-up!
Visit deepwiki com or just swap github → deepwiki on any repo URL: Go to to explore wikis for the most popular open source repos.

Turn on Deep Research for agent-powered in-depth answers (vid sped up). deepwiki.com
Dec 11, 2024 10 tweets 3 min read
Yesterday was Devin’s first day at work! Check out how engineering teams are building with Devin so far. Image 1/
Dec 10, 2024 6 tweets 3 min read
Devin is generally available today!

Just tag Devin to fix frontend bugs, create first-draft PRs for backlog tasks, make refactors, and more.

Start building with Devin below: 1/5 Devin is built to collaborate with engineering teams and starts at $500/month. Here’s how some of the best teams are using Devin today:
Sep 12, 2024 5 tweets 3 min read
We worked closely with OpenAI over the last few weeks to evaluate OpenAI o1's reasoning capabilities with Devin. We found that the new series of models is a significant improvement for agentic systems that deal with code.

Linked below is a deep dive with more eval results and how we think about evaluating coding agents. Here’s a summary of o1’s strengths and weaknesses:Image For this evaluation, we use a simplified version of Devin, called ”Devin-Base”, as the production version of Devin uses models post-trained on proprietary data. This allows us to specifically measure how changes in base models impact Devin’s capabilities.

In comparison to GPT-4o, we found that o1 has a striking ability to reflect and analyze. It will often backtrack and consider different options before arriving at the correct solution, and is less likely to hallucinate or be confidently incorrect. When using o1-preview, Devin is more likely to correctly diagnose root cause issues, rather than addressing the symptoms of a problem.

In the clip below, Devin encounters an unexpected error and needs to use its problem-solving abilities. It researches the internet like a human would and after a few steps finds a relevant GitHub issue for its problem.
May 6, 2024 8 tweets 2 min read
We just co-hosted our first ever small hackathon for Devin. We gave all teams early access to Devin for 24 hours. Some of the projects: Image 1. d3n is an AI agent orchestration framework that can spawn a fleet of Devin instances to tackle distributed problems in parallel
Mar 12, 2024 9 tweets 3 min read
Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.

Check out what Devin can do in the thread below. 1/4 Devin can learn how to use unfamiliar technologies.