An update on recent Claude Code quality reports

Simon Willison's Blog / 4/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Claude Code reportedly produced worse-quality results for two months, but Anthropic’s postmortem found the issues were caused by problems in the harness rather than the underlying models.
  • One highlighted incident involved a change intended to clear Claude’s older thinking in sessions idle for over an hour; a bug instead repeated that clearing behavior every turn for the rest of the session, making responses seem forgetful and repetitive.
  • The article notes that real user workflows—returning to long-idle sessions—can be disproportionately affected by such harness bugs, increasing the time spent prompting.
  • The author argues that agentic systems developers should read the postmortem because harness bugs can be complex and materially impact user experience even without blaming the non-deterministic model behavior.
Sponsored by: Honeycomb — AI agents behave unpredictably. Get the context you need to debug what actually happened. Read the blog

24th April 2026 - Link Blog

An update on recent Claude Code quality reports (via) It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems.

The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.

Anthropic's postmortem describes these in detail. This one in particular stood out to me:

On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.

I frequently have Claude Code sessions which I leave for an hour (or often a day or longer) before returning to them. Right now I have 11 of those (according to ps aux  | grep 'claude ') and that's after closing down dozens more the other day.

I estimate I spend more time prompting in these "stale" sessions than sessions that I've recently started!

If you're building agentic systems it's worth reading this article in detail - the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves.

Posted 24th April 2026 at 1:31 am

This is a link post by Simon Willison, posted on 24th April 2026.

ai 1980 prompt-engineering 186 generative-ai 1756 llms 1723 anthropic 276 coding-agents 196 claude-code 108

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe