87% Cost Savings & Sub-3s Latency: I built a "Warm-Cache" harness for persistent Claude agents.

Reddit r/artificial / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author open-sourced a persistent Claude “warm-cache” harness called Galadriel to significantly reduce LLM prompt-context costs by optimizing Prompt Caching rather than using default agent setups.
Reported results include about an 85% latency reduction (100K token context from ~11 seconds to under 3 seconds) and roughly 87% cost savings ($10 per $100 normally spent), benchmarked against OpenClaw/Cursor-style workflows.
Galadriel uses a 3-tier stacked caching approach that creates separate cache breakpoints for tool definitions, system prompts (CLAUDE.md), and trailing history to maximize cache reuse.
The harness integrates MemPalace for persistent, vector-based recall (“memory”) while aiming not to disrupt cached prompt segments, and it is designed for private/subnet deployments using only the user’s API key.
The project claims to include built-in engineering/ethics guidelines (via “Karpathy-style” CLAUDE.md principles) to prevent “agent bloat,” and the author invites community feedback alongside a GitHub MIT-licensed release.

The "Goldfish Problem" is Expensive. I Decided to Fix the Plumbing.

Most Claude implementations leave 90% of their money on the table because they don’t optimize for Prompt Caching. I’ve been running a personal agent in my Discord for months that manages my AWS infra and codebases, and I finally open-sourced the harness, which I’ve named Galadriel after my main personal assistant.

The Stats

Cost: $10 for every $100 you’d normally spend (Tested against OpenClaw/Cursor workflows).
Speed: 85% drop in latency. 100K token context goes from 11s to <3s.
Memory: Integrated MemPalace for permanent, vector-based recall that doesn't break the cache.

The Technical Stack

3-Tier Stacked Caching: Separate breakpoints for Tool Definitions, System Prompts (CLAUDE.md), and Trailing History.
Privacy: Built for private subnets. No middleman, no message caps—just your API key and your rules.
Ethics: Baked-in KarpathyCLAUDE.md)guidelines to kill "agent bloat."

If you’re tired of paying the "Context Tax" just to have an agent that remembers who you are, here you go. It is customized for Discord for my specific needs, but the core logic ensures Galadriel runs like an absolute dream: she never forgets, maintains strict engineering principles, and optimizes every cycle.

Your feedback is most welcome!

GitHub (MIT License):https://github.com/avasol/galadriel-public

submitted by /u/Phobix
[link] [comments]