Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

arXiv cs.CL / 4/17/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Controlling Authority Retrieval (CAR) to handle cases where newer documents can formally supersede earlier ones under authority (e.g., law, FDA rules, security advisories) even when they are semantically distant.
It formalizes CAR as recovering the active “authority frontier” of the semantic anchor set, distinguishing it from standard retrieval objectives like argmax_d s(q,d).
The authors provide a necessary-and-sufficient characterization (Theorem 4) for when a retrieved set achieves TCA(R,q)=1, based on frontier inclusion and a constraint against ignored superseders.
They show a hard worst-case limit for any scope-indexed retrieval algorithm (Proposition 2), bounding TCA@k by phi(q) times anchor relevance.
Experiments across multiple real-world corpora and a GPT-4o-mini downstream test demonstrate that a two-stage CAR-style approach substantially reduces false “not patched” or otherwise superseded claims, and the authors release datasets and code.

Abstract

In any domain where knowledge accumulates under formal authority -- law, drug regulation, software security -- a later document can formally void an earlier one while remaining semantically distant from it. We formalize this as Controlling Authority Retrieval (CAR): recovering the active frontier front(cl(A_k(q))) of the authority closure of the semantic anchor set -- a different mathematical problem from argmax_d s(q,d). The two central results are: Theorem 4 (CAR-Correctness Characterization) gives necessary-and-sufficient conditions on any retrieved set R for TCA(R,q)=1 -- frontier inclusion and no-ignored-superseder -- independent of how R was produced. Proposition 2 (Scope Identifiability Upper Bound) establishes phi(q) as a hard worst-case ceiling: for any scope-indexed algorithm, TCA@k <= phi(q) * R_anchor(q), proved by an adversarial permutation argument. Three independent real-world corpora validate the proved structure: security advisories (Dense TCA@5=0.270, two-stage 0.975), SCOTUS overruling pairs (Dense=0.172, two-stage 0.926), FDA drug records (Dense=0.064, two-stage 0.774). A GPT-4o-mini experiment shows the downstream cost: Dense RAG produces explicit "not patched" claims for 39% of queries where a patch exists; Two-Stage cuts this to 16%. Four benchmark datasets, domain adapters, and a single-command scorer are released at https://github.com/andremir/car-retrieval.