Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

arXiv cs.CL / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many persistent RAG retrieval failures are caused by a misalignment between the query and the evidence representation space, not by a lack of relevant documents.
  • It introduces Skill-RAG, which adds a lightweight hidden-state prober and a prompt-based skill router to diagnose failure states instead of simply retrying retrieval.
  • Skill-RAG gates retrieval at two pipeline stages and, when a failure is detected, selects one of four “retrieval skills” (query rewriting, question decomposition, evidence focusing, or an exit for irreducible cases) to correct misalignment before the next generation attempt.
  • Experiments on multiple open-domain QA and complex reasoning benchmarks show notable accuracy improvements on hard, multi-turn-persistent cases, with especially strong gains on out-of-distribution datasets.
  • Representation-space analyses suggest the different retrieval skills correspond to structured and separable regions of the failure-state space, indicating misalignment is a typed phenomenon.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a foundational paradigm for grounding large language models in external knowledge. While adaptive retrieval mechanisms have improved retrieval efficiency, existing approaches treat post-retrieval failure as a signal to retry rather than to diagnose -- leaving the structural causes of query-evidence misalignment unaddressed. We observe that a significant portion of persistent retrieval failures stem not from the absence of relevant evidence but from an alignment gap between the query and the evidence space. We propose Skill-RAG, a failure-aware RAG framework that couples a lightweight hidden-state prober with a prompt-based skill router. The prober gates retrieval at two pipeline stages; upon detecting a failure state, the skill router diagnoses the underlying cause and selects among four retrieval skills -- query rewriting, question decomposition, evidence focusing, and an exit skill for truly irreducible cases -- to correct misalignment before the next generation attempt. Experiments across multiple open-domain QA and complex reasoning benchmarks show that Skill-RAG substantially improves accuracy on hard cases persisting after multi-turn retrieval, with particularly strong gains on out-of-distribution datasets. Representation-space analyses further reveal that the proposed skills occupy structured, separable regions of the failure state space, supporting the view that query-evidence misalignment is a typed rather than monolithic phenomenon.