Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
arXiv cs.CL / 4/17/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper argues that conventional RAG limits LLM agents because they only passively consume retrieved results without understanding corpus structure or what has not been retrieved.
- It introduces Corpus2Skill, which converts an enterprise document corpus into a hierarchical, navigable “skill directory” offline using iterative clustering and LLM-generated summaries at multiple levels.
- At serving time, an LLM agent uses the explicit hierarchy to decide where to look, drill down topic branches via progressively detailed summaries, and retrieve full documents by ID.
- The explicit tree enables better backtracking and the ability to combine evidence across different branches, improving reasoning over scattered information.
- Experiments on WixQA show Corpus2Skill outperforms dense retrieval, RAPTOR, and agentic RAG baselines across all evaluated quality metrics.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.



