Self-Refining Agents in Spec-Driven Development

Dev.to / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The article describes a spec-driven workflow that, unexpectedly, caused the agent to review and improve its own previous work instead of regenerating from scratch.
In iterate mode (-i), running /spec -i 1234 <contents> creates a /specs/1234/ folder structure with passes (p01, p02, p03, ...) containing spec.md, plan.md, implementation.md and related files.
During a test with no changes to the spec, Pass 2 reviewed Pass 1, identified gaps, and refined its original output, effectively self-improving without explicit instructions.
The behavior points to emergent agentic capabilities in spec-driven workflows and prompts a reconsideration of development processes that leverage self-refinement in automation.

I’ve been experimenting with a spec-driven workflow, and I accidentally discovered something I didn’t expect: the agent started reviewing and improving its own work.

What I discovered is not new in terms of agentic AI; it's the point of agentic AI, but how I stumbled across it in my test was interesting nonetheless.

The Basic Idea

I created a spec.prompt.md file. This prompt accepts a ticket number and the pasted contents of the technical specifications. Then I run a command like:

/spec <ticket-number> <pasted-contents>

Originally, each time I ran /spec, it would overwrite everything and start from scratch. That worked, but it didn’t allow me to iterate or compare changes between passes.

So I added two modes:

-o = overwrite

-i = iterate

The Iterate Model

When I run:

/spec -i 1234 <contents>

It creates a folder structure like this:

/specs/1234/
    p01/
        spec.md
        plan.md
        implementation.md
        files...
    p02/
        spec.md
        plan.md
        implementation.md
        files...
    p03/
        ...

The original intent was to provide myself a comparison between p01 vs p02 to see if changes to the spec were implemented correctly but something unexpected happened.

For my first test of -i, I didn't change the spec at all. I just ran the same spec again:

/spec -i 1234 <original contents>

I expected it to regenerate everything so I could compare Pass 1 to Pass 2. Instead, Pass 2 did something even smarter.

When reviewing Pass 2, I couldn't find the original build scripts and the bulk of the work was missing. I was confused why I only found a single file.

Then, it clicked: it had reviewed its previous pass to identify any gaps in the duplicated spec I provided, determined it was mostly correct, and refined its original pass.

That’s when this stopped being just a prompt and started looking like an agent. I didn’t tell it to rewrite everything. I didn’t tell it to only fix what was wrong. I didn’t tell it to review its previous implementation. It chose to.

Its self-refinement process made me rethink my own process.

The Process

Currently I have these prompts:

spec.prompt.md
spec-implement.prompt.md
spec-testing.prompt.md
using backend-engineer agent

Right now, I manually run each step after the previous one completes. But the long-term goal is for the agent to understand the workflow and run the steps itself.

Pass Workflow

Pass 1 — Spec → Plan → Implement

/spec 1234 <contents from Jira>

The agent:

Reads the ticket
Writes the spec
Creates a plan
Implements the code
Saves everything into p01

Pass 2 — Self-Refinement

/spec -i 1234

The agent:

Re-reads the spec
Reviews Pass 1
Fixes gaps
Improves implementation
Adds missing tests
Refactors if needed
Saves into p02

This becomes a self-refinement loop:

Implement → Review → Refine → Review → Refine

The agent continues iterating until it believes the implementation matches the spec.

Pass 3 — Spec Updates from Engineer

After reviewing, the engineer may realize:

Something was unclear
Requirements changed
Edge cases were missed
Naming should be improved
Logic should be handled differently

The engineer updates the spec, then runs:

/spec -u 1234 <updated contents>

Now the agent:

Compares original spec vs updated spec
Compares p01 vs p02 vs p03 etc.
Determines what changed in the spec and how that changes the code
Implements only what’s different, not everything.
Creates a new pass

This becomes iterative spec-driven development.

Final Phase — Testing Instructions

When the engineer believes the implementation is ready for validation:

/spec-testing 1234

The agent then:

Reviews the latest pass
Identifies all changed files
Provides test scenarios to validate the changes

At that point, the engineer tests — not writes everything from scratch.

Expect the Unexpected and "Just Keep Swimming"

The interesting part of this experiment wasn’t that the agent wrote code. It was that it started reviewing, refining, and improving its own work in passes — the same way a developer does.

I didn’t set out to build this agent or this workflow. I set out to write a better prompt. Then I wanted it to do a little more, and a little more. Somewhere along the way, the prompt turned into a process.

This isn’t just prompt engineering anymore.
It’s process engineering.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/22DailyView insight →

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

The Beginner's Guide to Crypto Paper Trading with AI in 2026

Dev.to

How I Gave My AI a Real Brain: The System That Runs Half My Company

Dev.to

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

Self-Refining Agents in Spec-Driven Development

Key Points

The Basic Idea

The Iterate Model

The Process

Pass Workflow

Pass 1 — Spec → Plan → Implement

Pass 2 — Self-Refinement

Pass 3 — Spec Updates from Engineer

Final Phase — Testing Instructions

Expect the Unexpected and "Just Keep Swimming"

💡 Insights using this article

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

The Beginner's Guide to Crypto Paper Trading with AI in 2026

How I Gave My AI a Real Brain: The System That Runs Half My Company

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer