Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain

The Register / 4/14/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Read original →

共有:

Key Points

Anthropic has adjusted Claude Code defaults, but developers are reporting that the experience is still confusing around caching behavior (“Claude Code cache confusion”).
Reports indicate that long coding sessions are now consuming usage/quotas faster than expected, even after the default tweaks.
The core issue described is a mismatch between how caching is believed to work and how usage is actually being metered during extended use.
Despite the change in defaults, quota drain remains a persistent pain point for users running longer development workflows.
The article frames this as a practical reliability/efficiency concern for day-to-day AI coding tool usage rather than a one-off misconfiguration.

AI + ML

Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain

Dev reports suggest long sessions now burn through usage much faster

Tim Anderson

Mon 13 Apr 2026 // 15:14 UTC

Anthropic last month reduced the TTL (time to live) for the Claude Code prompt cache from one hour to five minutes for many requests, but said this should not increase costs despite users reporting faster depleting quotas.

User Sean Swanson posted a bug report showing that Anthropic introduced a one-hour cache for Claude Code context around February 1, then changed it back to a five-minute cache around March 7. "The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage," said Swanson.

When using AI coding assistants or agents, the context is additional data sent along with the user's prompts, such as existing code or background instructions. Context improves the accuracy of the AI but also requires more processing.

Claude prompt caching avoids re-processing previously used prompts including context and background information. The cache can have either a five-minute or one-hour TTL. Writing to the five-minute cache costs 25 percent more in tokens, and writing to the one-hour cache 100 percent more, but reading from cache is around 10 percent of the base price.

Jarred Sumner, the creator of the Bun JavaScript runtime who now works for Anthropic, agreed that the analysis was "good detective work" but claimed that the change back to the five-minute cache made Claude Code cheaper because "a meaningful share of Claude Code's requests are one-shot calls where the cached context is used once and not revisited." Sumner said that the Claude Code client determines the cache TTL automatically and there are no plans for a global setting.

Swanson revised his analysis in response, agreeing that sessions using subagents do benefit from the lower write cost of the five-minute cache since they interact quickly and "their caches almost never expire." However, he said he has been a $200 per month subscriber for over six months and had never hit a quota limit until March. The "extra burn rate" is "making a once great service unusable," he said.

Another factor is that the large one-million-token context window available on paid plans with the Claude Opus 4.6 or Sonnet 4.6 models increases costs, especially with cache misses. Claude Code creator Boris Cherny said that "prompt cache misses when using 1M token context window are expensive... if you leave your computer for over an hour then continue a stale session, it's often a full cache miss." He said that Anthropic is investigating a 400,000-token context window by default, with an option for one million tokens if preferred. There is already a configuration setting for this.

Cherny said that larger contexts are now common because users are "pulling in a large number of skills, or running many agents or background automations."

Some developers are convinced that cache rebuilding and cache misses are major factors in Claude Code quota exhaustion, which has reached the point where Pro users ($20 per month) may get as few as two prompts in five hours. A number of bugs in the caching code have been reported, such that one user said: "Before those are fixed likely any 5 minutes vs 1 h discussion is entirely moot since numbers are totally flawed."

The focus on cache optimization may also be evidence that, under the covers, Anthropic's quotas are simply buying less processing time than they did.

Swanson is not alone in reporting that Claude's performance has dropped. For example, a user on the enterprise team plan said: "In March I could use Opus all day and it was getting great results. Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of 'but wait, actually I need to do x' with slight variations." That chimes with similar comments from an AI director at AMD.

Cache optimization may be important, but it seems unlikely to account for all these reported issues. ®