Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
arXiv cs.CL / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM-based coding agents must not always reuse retrieved external memory, because superficial matches can cause unsafe “memory injection” into the debugging process.
- It reframes memory retrieval as a risk-sensitive selective control problem and proposes RSCB-MC, a contextual bandit that can choose among multiple retrieval and abstention actions (including not using memory or asking for feedback).
- RSCB-MC stores reusable issue knowledge using a pattern-variant-episode schema and represents retrieval context with a 16-feature state capturing relevance, uncertainty, structural compatibility, feedback history, false-positive risk, latency, and token cost.
- Its reward function heavily penalizes false-positive memory injection relative to missed reuse, making abstention and non-injection explicit safety-first options.
- In offline and bounded hot-path validations, RSCB-MC achieves strong replay/proxy success rates (62.5% offline; 60.5% proxy) while maintaining a 0.0% false-positive rate and low decision latency (p95 ~331 microseconds).
Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...
Dev.to

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.
Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to
AI made learning fun again
Dev.to

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...
Dev.to