CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

arXiv cs.CL / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many LLM-based KGQA systems are “stateless” and fail to leverage historical query patterns, which can cause schema hallucinations and incomplete retrieval coverage.
  • It proposes CacheRAG, a cache-augmented architecture that turns retrieval-planning components into a continual learner for knowledge graph question answering.
  • CacheRAG uses a schema-agnostic two-stage semantic parsing approach (Intermediate Semantic Representation plus a backend adapter) to let users query in natural language while safely grounding executions with local schema context.
  • It improves cache utilization with a hierarchical Domain→Aspect index and diversity-oriented retrieval using Maximal Marginal Relevance (MMR), and it expands search with bounded, deterministic subgraph operators to control complexity.
  • Experiments on multiple benchmarks show substantial gains over prior methods, including +13.2% accuracy and +17.5% truthfulness on the CRAG dataset.

Abstract

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain \rightarrow Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).