I asked Claude Code to refactor a feature it built last week. It hated its own code.

Dev.to / 4/27/2026

💬 OpinionIdeas & Deep Analysis

共有:

Key Points

The author asked Claude Code to extend a feature it had built six days earlier, only to receive a critique and suggested refactor rather than a straightforward addition.
The article argues that agents “forgetting” across sessions isn’t only a drawback; it can enable a fresh, unbiased code review that is hard for humans to replicate when revisiting their own prior work.
It explains how lack of ego attachment and absence of remembered constraints can lead the agent to focus purely on code quality and improvement opportunities.
The author reframes the so-called “memory problem” as potentially misdiagnosed, suggesting the behavior may be closer to an ideal reviewer than a malfunction needing more context.
Overall, the piece positions AI coding agents’ lack of continuity as a feature that can improve refactoring and maintainability outcomes.

Last week I opened a feature I'd built six days earlier and asked Claude Code to extend it.

I hadn't touched the file since. The code was working. Tests passing. Nothing was on fire. I just wanted to add a small capability — fifteen minutes of work, maybe.

What I got back wasn't an extension. It was a quiet roast.

"Before extending this, I'd recommend refactoring. The current implementation has a few issues: the error handling is inconsistent, the function is doing more than its name suggests, and there's a duplicated validation block that should be a helper. Want me to fix those first?"

Reader, that code was written by Claude Code. Same project. Same model tier. Six days earlier. By the same agent that was now telling me, with polite confidence, that it was a mess.

I sat with that for a minute. Then I started thinking about what it meant. Because the more I thought, the more I realized that this — this exact moment — is the thing nobody is writing about with AI coding agents.

The "memory problem" everyone complains about

If you've used Claude Code, Codex, Cursor in agent mode, or any of the rest, you've hit it: agents don't have continuity between sessions. Every new conversation starts cold. They don't remember what they decided last Tuesday. They don't remember why a particular abstraction exists. They don't remember that you already debated naming this function and settled on something specific.

Most of the discourse treats this as a problem to solve. We bolt on memory systems. We pile up CLAUDE.md files. We build context-loading scripts. The implicit assumption is that the agent should remember, and the work is to compensate for the fact that it doesn't.

I'm starting to think that framing is exactly backwards.

What actually happened in that session

When Claude Code criticized its own code from six days earlier, it wasn't being inconsistent. It wasn't malfunctioning. It was being honest — in a way no human collaborator can.

Think about what just happened:

A reviewer with no ego attachment to the code looked at it fresh.
It had no memory of why we wrote it that way, what we'd ruled out, what compromises we made under time pressure, or what I muttered at 11pm when the test finally passed.
It just looked at the code on its merits and said: this could be better.

That's not a bug. That's the cleanest code review you'll ever get.

A human reviewing their own code from a week ago has every cognitive bias working against them. They remember the constraints. They remember the deadline. They remember that they "knew" the helper was duplicated but didn't want to break the function signature. They protect their past decisions because their past decisions are theirs.

Claude Code, with no continuity, has none of that. Every session is a new pair of eyes that hasn't been bought off by yesterday's reasoning.

The reframe

Once I saw this, I started doing it on purpose.

When I finish a feature now, I don't immediately ship it. I commit, sleep on it, and the next morning I open a new Claude Code session and ask it to review the code as if it had never seen it. Because for that session, it hasn't.

The reviews are brutal. They are also, almost always, right.

It catches:

Functions that are doing two things and lying about it in their name
Error handling that's defensive in some places and absent in others
Helpers I should have extracted but didn't
Naming that made sense in context but doesn't survive cold reading
Tests that pass but verify the wrong thing

The same agent that wrote the code can't see these issues while it's writing, because it has the same tunnel vision a human does in flow. But strip its memory of the session, hand it the file, and it becomes the reviewer it could never be in real-time.

The "memory problem" is actually a feature gate to a different mode of useful behavior. We've been trying to remove it instead of using it.

Why this changes how I think about CLAUDE.md

I've written about CLAUDE.md before, and I want to refine what I said there, because this experience clarified something for me.

CLAUDE.md is not the agent's memory. That framing is wrong.

CLAUDE.md is the project's memory of decisions that shouldn't be relitigated every session. Things like:

"We use Tailwind v3, not v4. Don't migrate."
"The auth layer is intentionally simple. Don't add OAuth without asking."
"Tests live next to source files, not in /tests."

These are decisions, not style. CLAUDE.md exists so the agent doesn't waste a session arguing with itself about settled choices.

But CLAUDE.md should not contain things like:

"This function does X" (the code says that)
"Last week we refactored Y" (that's history, not a decision)
"The pattern we use is Z" (let the code be the source of truth)

The mistake is using CLAUDE.md as a memory dump. When you do that, you're trying to give the agent continuity — and you lose the fresh-eyes review benefit. You've turned your honest reviewer into a yes-man who already agrees with everything you've decided.

The right CLAUDE.md is short. It pins down decisions. It leaves judgment alone, so a new session can still tell you when your code is bad.

The mental model shift

Stop thinking of an AI coding agent as one consistent collaborator across time.

Start thinking of it as a high-quality reviewer you can summon, fresh, as many times as you want — provided you preserve the decisions, not the reasoning.

This changes the workflow:

Write code with the agent in a session.
Commit when it works.
Open a new session the next day. Ask it to review.
Take the criticism seriously. It's coming from someone who isn't defending yesterday's choices.
If the criticism is wrong, that's a signal that a decision is missing from CLAUDE.md.
If the criticism is right, fix it. The fact that the same agent wrote the code is irrelevant.

I've started shipping noticeably cleaner code since I adopted this. Not because Claude Code got smarter. Because I stopped trying to glue together its sessions and started using the gaps between them.

The closing thought

There's a lot of energy right now going into making AI agents more continuous, more memory-rich, more "aware" across sessions. Long-running agents. Persistent context. Memory layers.

I'm not against any of that. But I think the rush is hiding something valuable. The discontinuity between sessions is the thing that lets the agent be a real reviewer instead of a sympathetic teammate. Once you give it perfect memory, it stops being able to look at your code from outside, and it starts protecting your past decisions the way a human would.

Maybe the goal isn't to give agents continuity. Maybe it's to give them just enough of it — your settled decisions — and protect the rest. Let them forget how the code got that way. Let them tell you, fresh, that it's bad.

Mine did. It was right.

Has Claude Code ever roasted its own past output in your projects? I want to hear the worst one — drop it in the comments.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

I asked Claude Code to refactor a feature it built last week. It hated its own code.

Key Points

The "memory problem" everyone complains about

What actually happened in that session

The reframe

Why this changes how I think about CLAUDE.md

The mental model shift

The closing thought

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer