Autonomous Security
Hunting Bugs,
In A Swarm.
Vulnerability triage has long assumed a dedicated human team combing through code by hand. Cognition's new "Security Swarm" runs many Devin agents in parallel, chaining detection, verification, and patch PRs end to end. The flaws one senior once chased overnight are now surfaced by a swarm in minutes. Security posture is starting to depend less on headcount and more on agent count.
Recap — Before
Triage always started from "we don't have enough people."
For years, vulnerability triage meant a small elite of security engineers reading code line by line, mentally executing paths, and pattern-matching against attack shapes. The work was deeply personal, slow, and hard to scale. Most teams landed on "one audit a year, then hope."
A typical product team hires an external audit vendor once a quarter and waits two to four weeks for the report. Dozens of findings come back, prioritization becomes another meeting, patching lands on the product team, and re-verification bounces back to the vendor. Getting the actual code fixed can take months.
Meanwhile new features keep merging, dependencies keep updating, and the attack surface keeps growing. Defenders don't multiply, but the surface to defend does. That imbalance is what the industry has been calling "the security talent shortage." Everyone knows we should scan more often. The reason we don't is simple — there aren't enough hands.
What Changed — The Announcement
A swarm shows up to read your code.
Cognition announced Security Swarm, a system that runs many Devin agents in parallel to detect, sandbox-verify, and open patch PRs for vulnerabilities. Instead of one Devin working through issues sequentially, role-differentiated agents fan out across the codebase at once, chase suspicious paths, reproduce them inside sandboxes, commit patches, and file PRs. The only human step, in principle, is approval once the notification arrives.
In Cognition's internal evaluation, Security Swarm found 36 of 50 real-world CVEs at roughly $90 per scan — about 30% cheaper than rival systems. Those numbers don't put it in "universal detector" territory yet, but the more interesting story is frequency. At $90 you can afford to run it on every release, or every day. The economics shift from a single high-precision audit toward a stack of continuous lower-precision passes, and that's a different design point entirely.
How It Works — The pipeline
The swarm splits into four roles.
Under the hood, Security Swarm is a set of role-specialized Devins coordinating asynchronously. Each one is a familiar unit of work, but together they behave like a small security department.
Detect — sweep the paths
Scanner Devins combine static analysis with LLM reasoning to enumerate risky input paths, trust boundaries, and excessive permissions. Noise is fine at this stage — the goal is to list every suspicious candidate, not to be right about them.
Verify — reproduce in a sandbox
A verifier Devin spins up per candidate and tries to build a PoC inside an isolated runtime. Only the ones it can actually reproduce pass to the next stage, which squeezes hallucinated reports out of the pipeline.
Patch — file the PR
A patcher Devin edits the affected code, runs the impacted tests, and opens a PR with the rationale and reproduction steps attached. Because it lands on your normal PR flow, reviewers can treat it like "one more outside contributor."
Review — human approves
The human job shrinks to reading the PR and deciding whether to merge given business context. Triage itself disappears; only the yes/no remains. This is where security teams finally get to spend their day on the work only humans can do.
Too many to chase by hand,
too many to let slide.
Who Feels It — Impact
Where the load drops the most.
The biggest wins are where "defenders per unit code" was already too low. In mature enterprises with a full SOC, Swarm mostly augments an existing pipeline — useful, but incremental. The delta shows up on the thin end of the market.
Thin security teams
Mid-sized companies where one or two engineers cover the whole org part-time. For teams that couldn't afford an audit but couldn't ignore the risk, a $90 scan turns "later" into "this sprint."
DevSecOps in progress
Teams that wired static analysis into CI but are drowning in false positives. Because Swarm's sandbox step filters out unreproducible hits, only "the real ones" reach the PR queue — and shift-left starts working again.
OSS maintainers
Projects defended in nights and weekends. If a Swarm can be pointed at the repo on a schedule and just hand back patch PRs, the project's security half-life extends — and so does everyone downstream who depends on it.
Frontier — What comes next
"Autonomous security" redraws the org chart.
Once agent swarms are the thing reading your code, the security job moves away from reading reports and toward designing what the swarm is allowed to do. Holding the leash — deciding what to detect, what to auto-merge — becomes the seat of leverage.
The classic security org split into "the people who find things" and "the people who fix things." In a swarm-based operating model, both roles are largely automated. What's left is the person who sets the policy and the person who signs off at the boundary. The job titles may not change; the day-to-day certainly will.
At the same time, the more the swarm does, the higher the cost of "the swarm doing the wrong thing." False positives that halt shipping, patches that quietly change contracts, PRs that flood review queues — these failure modes don't go away. So the competition around Swarm-style tooling won't be about maximum automation, it'll be about trustworthy automation: how far can you push the boundary before humans stop trusting what comes back. Given how badly the defender headcount problem needed a lever, the swarm has quietly handed the industry a real one.