Cognition Profile picture
Jul 1 6 tweets 3 min read Read on X
Introducing Devin Security Swarm

A more cost effective and accurate way to find security vulnerabilities in complex codebases, based on a new architecture: Agentic MapReduce.
In testing, Devin Security Swarm found 36 of 50 real-world GHSA vulnerabilities at 30% lower cost per finding than the next most accurate alternative. Image
We built a new architecture for whole-codebase reasoning that we’re calling Agentic MapReduce.

Security scanning is different from most coding tasks: a report is only trustworthy if the whole codebase is considered. But most agentic systems struggle to scale reasoning across large repos.

Devin maps relevant signals across the repo, fans out focused agents over bounded shards, reduces their findings into one report, then verifies serious vulnerabilities in isolated sandboxes before marking them confirmed.
The result is simultaneously more efficient and more accurate than other tools. We evaluated a variety of security scanning tools on a dataset of 50 GHSA vulnerabilities across 14 languages including Go, Rust, Python, Ruby, Java, C#, JavaScript, C, Swift, Dart, and Elixir. The dataset spans opens source repos of various sizes and of many software categories.

Beyond excelling on our eval, Devin Security Swarm also found critical vulnerabilities that other tools missed, like a PHP sandbox bypass via template injection, an argument injection through metadata value parsing, and an overly broad deserialization surface.
Security Swarm is a new pillar of Devin for Security: a suite of tools to help you find vulnerabilities, validate their exploitability at runtime, and ship remediation PRs.

Learn more and try it today at:

devin.ai/security
We’re also publishing extensive documentation and technical materials about Agentic MapReduce, including a deep-dive on our evals.

Read our announcement: cognition.com/blog/introduci…

Learn about Agentic MapReduce: devin.ai/blog/agentic-m…

Check out the evals: devin.ai/blog/security-…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Cognition

Cognition Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @cognition

Jun 8
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers.

Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?Image
20+ world-class open-source developers built realistic coding tasks on repos they maintain. They define what “mergeable” means in their repo.

What does it take to measure mergeability? We use a mix of unit tests, rubrics and novel verifiers to assess correctness, test quality, scope discipline, style, and adherence to codebase standards.Image
FrontierCode was built in close partnership with the expert maintainers of 36 flagship open-source repositories, like @smilingnosrati, CEO & Tech Lead @CeleryOrg (29k stars), and Martin McKeaveney, CTO of @Budibase (28k stars).

Maintainers invested more than 40 hours per task, undergoing multiple rounds of iteration to ensure that any PR that satisfies these standards would actually be merged.Image
Image
Read 7 tweets
Jan 21
Meet Devin Review: a reimagined interface for understanding complex PRs.

Code review tools today don’t actually make it easier to read code. Devin Review builds your comprehension and helps you stop slop.

Try without an account:

More below 👇 devinreview.com
Full breakdown:

First, instead of presenting diffs alphabetically and file-by-file, Devin Review groups related changes together and orders them logically. Each group comes with a clear description of what’s going on. Devin Review also intelligently detects copied and moved code, separating signal from noise.cognition.ai/blog/devin-rev…Image
Devin Review includes a bug catching agent that labels potential issues by confidence and severity. It will also flag decisions / patterns that could be bad, even if they aren’t bugs, helping you stop slop.

Red: pay attention. Orange: take a look. Gray: FYI Image
Read 9 tweets
May 6, 2025
Our research interns present:
Kevin-32B = K(ernel D)evin

It's the first open model trained using RL for writing CUDA kernels. We implemented multi-turn RL using GRPO (based on QwQ-32B) on the KernelBench dataset.

It outperforms top reasoning models (o3 & o4-mini)! 🧵 Image
We train on a subset of 180 PyTorch -> CUDA conversion tasks from KernelBench. It's a nice RL environment because we have immediate code execution feedback.

During training, we give the model 4 refinement steps. In each step, the model proposes a kernel. Then we evaluate correctness & performance and inject the environment feedback in the next step.

For more details on how we made GRPO work in a multi-turn setting read our blogpost (linked below)!Image
We ablate two different ways of training:
- Single-turn RL (training on just the first step)
- Multi-turn RL (training on four refinement steps)

When evaluated on performance (= speedup of CUDA kernels over PyTorch) we see a significant improvement from multi-turn training.

The model learns how to refine itself more effectively!

(All models are evaluated on 4 & 8 refinement steps, i.e. same amount of compute)Image
Read 6 tweets
Apr 25, 2025
Project DeepWiki

Up-to-date documentation you can talk to, for every repo in the world.

Think Deep Research for GitHub – powered by Devin.

It’s free for open-source, no sign-up!
Visit deepwiki com or just swap github → deepwiki on any repo URL:
Go to to explore wikis for the most popular open source repos.

Turn on Deep Research for agent-powered in-depth answers (vid sped up). deepwiki.com
Don't see your repo? We're happy to index any public GitHub repo for you (watch how).

To get wikis for private repos, sign up for a Devin account at . devin.ai
Read 5 tweets
Dec 11, 2024
Yesterday was Devin’s first day at work! Check out how engineering teams are building with Devin so far. Image
Read 10 tweets
Dec 10, 2024
Devin is generally available today!

Just tag Devin to fix frontend bugs, create first-draft PRs for backlog tasks, make refactors, and more.

Start building with Devin below:
1/5 Devin is built to collaborate with engineering teams and starts at $500/month. Here’s how some of the best teams are using Devin today:
2/5 We worked with Devin to contribute to popular open source repos. Here is one example of a Devin session that triages, solves, and tests a fix for an issue in Anthropic’s MCP: app.devin.ai/sessions/26695…

The merged PR is here: github.com/modelcontextpr…

We’re sharing this session, and several other open source contributions, in our blog below.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(