Cursor rebuilt from scratch around managing fleets of AI agents instead of writing code. The demos look very convincing. The HN thread is a mess. And someone spent $2,000 in two days. Here's what actually matters.
Quick context if you haven't been following the AI tooling space: Cursor is the VS Code fork built by Anysphere that became the de facto AI coding tool for a huge chunk of the dev community, hit $2B ARR earlier this year, and raised over $3 billion from NVIDIA, Google, and others. It's the tool people recommend when someone asks "should I just use Copilot."
On April 2, 2026 they shipped Cursor 3, internally codenamed Glass. It's not a point release. They rebuilt the interface from scratch.
The pitch: you are the architect, agents are the builders. The IDE is still there, but the default experience is now managing a fleet.
Thirty minutes after the announcement hit Hacker News, the top comment wasn't about a feature.
What actually shipped
The headline change is the Agents Window -- a full-screen workspace running alongside the IDE where you manage multiple AI agents in parallel. Previously: one chat, one agent, one task at a time. Now you can run as many as you want across different repos, local machines, worktrees, SSH environments, and cloud VMs from one place.
A few things worth knowing about specifically:
Cloud agent handoff is the feature that makes the rest of it real. Start a session locally, hand it to a cloud VM, close your laptop, come back to a finished PR. This is the part that shifts "AI coding assistant" into something closer to "asynchronous engineering team." Whether that's what you want is a different question.
Composer 2 is Cursor's in-house coding model -- runs locally, no per-use cloud charges, higher usage limits. There's a story here about how they disclosed it (or didn't) that we'll get to.
/multitask, shipped in 3.2 on April 24, breaks a large task into chunks and fires them at a fleet of subagents simultaneously. Cross-repo too. This is where the "agent execution runtime" framing starts to feel accurate rather than just aspirational -- and where Cursor starts looking less like an IDE and more like a CI/CD layer you interact with conversationally.
The MCP Marketplace rounds it out. Cursor is quietly becoming a platform. That matters for lock-in reasons as much as feature reasons.
The philosophy shift, and why half the community isn't happy about it
Cursor's co-founders framed this release around "three eras of software development." Era one: you edit files manually. Era two: agents write most of the code while you direct. Era three: fleets of agents ship improvements autonomously while you review.
They're betting we're in era two right now, and building toward three. The interface reflects that.
The top HN comment the day it launched:
"I wish they'd keep the old philosophy of letting the developer drive and the agent assist. I still want to code, not vibe my way through tickets."
A Cursor engineer responded within minutes -- the IDE still exists, the Agents Window is a separate surface, you can have both open simultaneously or ignore agents entirely. Both things are true. But they're not actually disagreeing about features, they're disagreeing about what the job is supposed to be.
That disconnect is the real story here. Not what features shipped, but what Anysphere believes about where software development is heading, and whether developers agree with that framing. A lot of people who use Cursor are there precisely because they want to stay close to the code. The agent-first pitch reads to them as the tool choosing a direction they didn't ask for.
And they're not wrong to push back -- because what Cursor 3 is really proposing isn't more automation on top of your existing job. It's a different job. Writing code and managing outputs from multiple semi-autonomous systems running in parallel are not the same skill. They use different mental models, different review instincts, different debugging approaches. One is authorship. The other is closer to code review at scale with partial information and no single source of truth.
"Orchestrating a fleet" is not a more productive version of "writing systems software." It's a different mode of working. Cursor 3 has a strong opinion on which mode matters more. You might not share it.
The Composer 2 situation
Cursor didn't disclose what model Composer 2 is built on in their initial announcement. An external developer spotted the identifier kimi-k2p5-rl-0317-s515-fast in system responses and traced it back to Kimi K2.5 from Moonshot AI.
Co-founder Aman Sanger called the omission "a miss" and said they'd disclose the base model upfront for future releases. Moonshot AI confirmed it was an authorized commercial partnership through Fireworks AI. About 75% of Composer 2's total compute came from Cursor's own continued pre-training and reinforcement learning on top of the base -- so it's not just a reskin. But the lack of upfront disclosure did not go over great.
On benchmarks: Composer 2 scores 61.7 on Terminal-Bench 2.0 vs Opus 4.6's 58.0. GPT-5.4 sits at 75.1. Google's Antigravity scores 76.2 on SWE-bench Verified. Cursor is competitive but not leading -- which matters more now that they have an in-house model to defend.
The upside is real though. Local execution, no per-use cloud charges, higher usage limits than routing everything to frontier models. For people who were burning through Claude credits in Cursor, it's a meaningful cost relief for standard tasks.
The cost thing is not a footnote
This is the part most people are going to ignore until they get billed.
Cursor's pricing page lists four tiers: Free, Pro at $20/month, Pro+ at $60, Ultra at $200. Those numbers look fine. The issue is that cloud agents aren't metered the way the pricing page implies.
Early adopters on Hacker News reported spending $2,000+ running cloud agents. Not $2,000/month. Two days. One user switched from $1,800/month on Cursor to roughly $200/month on Claude Code, calling it "WAY better value for money." Another reported "$2k a week with premium models" before switching.
The per-minute VM charges for cloud execution are not disclosed on the pricing page. You find out when the bill arrives.
Compare: Claude Code Max runs at a flat $100-200/month with parallel execution via worktrees. If you're doing heavy agentic work, the math is not subtle.
Local agents via Composer 2 have no per-use charges -- that's the intended use case for standard tasks. Cloud agents are where the real power is (overnight runs, mobile-triggered tasks, multi-repo parallelism) and that's also where the costs are opaque. Track your spend for a full week before assuming the listed tier is what you'll actually pay.
The feature is real. The value is real for the right workload. But the pricing model is designed around the demos, not around what happens when you actually run it for a week.
Where it sits in the landscape
The AI IDE space consolidated fast this year. Three distinct philosophies, worth knowing the difference:
Cursor 3 -- IDE-native, GUI-first, now agent-first. If you want visual tooling, parallel agents with a management UI, and the ability to annotate a browser and tell an agent to fix that exact thing, Cursor is where that workflow is most mature. Cost: $20/month listed, variable in practice.
Claude Code -- terminal-native, stays out of your way. No GUI, runs in your existing terminal, integrates with whatever editor you already use. Still ahead on fully autonomous agentic work for people who don't want an IDE wrapper around everything. Flat $100-200/month at Max tier.
Google Antigravity -- the wildcard. Built from scratch (not a VS Code fork) by the team Google acquired for $2.4B, shipped free in November 2025, 76.2% on SWE-bench Verified which is one of the highest published numbers for a coding agent right now. Free. Worth a weekend if you haven't looked.
ForgeCode -- open source, terminal-based, bring your own API keys, topped Terminal-Bench 2.0 at 81.8%. Their blog post about hitting number one is titled "benchmarks don't matter," which is either a good sign or a bad sign depending on your priors. Worth a weekend too.
What this actually means
The "you're the architect, agents are the builders" framing is going to keep coming up. Cursor 3 is the most explicit statement of that direction from a major tool yet, but it's not the only one heading there. Antigravity, Claude Code, Codex -- they're all converging on the same mental model.
The question worth sitting with if you build systems software, CLI tools, or anything requiring you to stay close to the metal: does agent orchestration actually help that workflow, or does it mostly help the "generate a CRUD app from a prompt" workflow and kind of work for everything else as a side effect?
My honest read: parallel agents are genuinely useful for tasks with clear boundaries and independent surface area. Spin up three agents on three separate features, review the PRs, merge what works. That's real. For deep systems work where the whole point is that you're carefully reasoning through one gnarly problem -- handing that to a fleet isn't faster, it's noisier. You spend the time you saved writing code on reviewing agent output that's plausible-looking but wrong in ways that only show up later.
It'll get there. The benchmarks are moving fast enough that "this doesn't work for systems work" is probably a 2026 statement, not a permanent one. But right now, parallel agents are mostly useful for bounded tasks where correctness is verifiable and the problem decomposes cleanly. That's a real category of work. It's just not all the work.
The more interesting shift is the one underneath all of this. Cursor 3 isn't really about parallel agents as a feature. It's about what the tooling assumes the job looks like. And if the tools all converge on "you manage agents, you don't write code," the developers who push back aren't being resistant to change -- they're noticing that nobody asked whether that's actually the job they signed up for.
Where to dig in
-
Cmd+Shift+P -> Agents Window-- try the parallel agents UI in Cursor 3 - cursor.com/changelog -- they ship fast, worth following
- ForgeCode on GitHub -- bring-your-own-keys, open source, worth a look if you're skeptical of the closed tooling direction
- Google Antigravity -- free, agent-first, no VS Code fork baggage
What's your current setup? Cursor, Claude Code, something else entirely? And if you've actually run parallel agents in production -- how'd the costs shake out? Drop it in the comments, genuinely curious where people land on this.



