AI Navigate

Self-Refining Agents in Spec-Driven Development

Dev.to / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The article describes a spec-driven workflow that, unexpectedly, caused the agent to review and improve its own previous work instead of regenerating from scratch.
  • In iterate mode (-i), running /spec -i 1234 <contents> creates a /specs/1234/ folder structure with passes (p01, p02, p03, ...) containing spec.md, plan.md, implementation.md and related files.
  • During a test with no changes to the spec, Pass 2 reviewed Pass 1, identified gaps, and refined its original output, effectively self-improving without explicit instructions.
  • The behavior points to emergent agentic capabilities in spec-driven workflows and prompts a reconsideration of development processes that leverage self-refinement in automation.

I’ve been experimenting with a spec-driven workflow, and I accidentally discovered something I didn’t expect: the agent started reviewing and improving its own work.

What I discovered is not new in terms of agentic AI; it's the point of agentic AI, but how I stumbled across it in my test was interesting nonetheless.

The Basic Idea

I created a spec.prompt.md file. This prompt accepts a ticket number and the pasted contents of the technical specifications. Then I run a command like:

/spec <ticket-number> <pasted-contents>

Originally, each time I ran /spec, it would overwrite everything and start from scratch. That worked, but it didn’t allow me to iterate or compare changes between passes.

So I added two modes:

-o = overwrite

-i = iterate

The Iterate Model

When I run:

/spec -i 1234 <contents>

It creates a folder structure like this:

/specs/1234/
    p01/
        spec.md
        plan.md
        implementation.md
        files...
    p02/
        spec.md
        plan.md
        implementation.md
        files...
    p03/
        ...

The original intent was to provide myself a comparison between p01 vs p02 to see if changes to the spec were implemented correctly but something unexpected happened.

For my first test of -i, I didn't change the spec at all. I just ran the same spec again:

/spec -i 1234 <original contents>

I expected it to regenerate everything so I could compare Pass 1 to Pass 2. Instead, Pass 2 did something even smarter.

When reviewing Pass 2, I couldn't find the original build scripts and the bulk of the work was missing. I was confused why I only found a single file.

Then, it clicked: it had reviewed its previous pass to identify any gaps in the duplicated spec I provided, determined it was mostly correct, and refined its original pass.

That’s when this stopped being just a prompt and started looking like an agent. I didn’t tell it to rewrite everything. I didn’t tell it to only fix what was wrong. I didn’t tell it to review its previous implementation. It chose to.

Its self-refinement process made me rethink my own process.

The Process

Currently I have these prompts:

  • spec.prompt.md
  • spec-implement.prompt.md
  • spec-testing.prompt.md
  • using backend-engineer agent

Right now, I manually run each step after the previous one completes. But the long-term goal is for the agent to understand the workflow and run the steps itself.

Pass Workflow

Pass 1 — Spec → Plan → Implement

/spec 1234 <contents from Jira>

The agent:

  • Reads the ticket
  • Writes the spec
  • Creates a plan
  • Implements the code
  • Saves everything into p01

Pass 2 — Self-Refinement

/spec -i 1234

The agent:

  • Re-reads the spec
  • Reviews Pass 1
  • Fixes gaps
  • Improves implementation
  • Adds missing tests
  • Refactors if needed
  • Saves into p02

This becomes a self-refinement loop:

Implement → Review → Refine → Review → Refine

The agent continues iterating until it believes the implementation matches the spec.

Pass 3 — Spec Updates from Engineer

After reviewing, the engineer may realize:

  • Something was unclear
  • Requirements changed
  • Edge cases were missed
  • Naming should be improved
  • Logic should be handled differently

The engineer updates the spec, then runs:

/spec -u 1234 <updated contents>

Now the agent:

  • Compares original spec vs updated spec
  • Compares p01 vs p02 vs p03 etc.
  • Determines what changed in the spec and how that changes the code
  • Implements only what’s different, not everything.
  • Creates a new pass

This becomes iterative spec-driven development.

Final Phase — Testing Instructions

When the engineer believes the implementation is ready for validation:

/spec-testing 1234

The agent then:

  • Reviews the latest pass
  • Identifies all changed files
  • Provides test scenarios to validate the changes

At that point, the engineer tests — not writes everything from scratch.

Expect the Unexpected and "Just Keep Swimming"

The interesting part of this experiment wasn’t that the agent wrote code. It was that it started reviewing, refining, and improving its own work in passes — the same way a developer does.

I didn’t set out to build this agent or this workflow. I set out to write a better prompt. Then I wanted it to do a little more, and a little more. Somewhere along the way, the prompt turned into a process.

This isn’t just prompt engineering anymore.
It’s process engineering.