Budget-Aware Routing for Long Clinical Text
arXiv cs.AI / 5/4/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The paper addresses the high token cost and latency of large language model deployments by selecting a budgeted subset of long clinical inputs under a strict token limit.
- It formulates budgeted context selection as a knapsack-constrained subset selection problem and studies how document unitization and unit selection affect performance.
- The authors propose RCD, a monotone submodular objective that jointly optimizes relevance, coverage, and diversity for the selected context.
- They compare multiple unitization strategies (sentence, section, window, cluster) and add a routing heuristic that adapts to different budget regimes.
- Experiments across MIMIC discharge notes, Cochrane abstracts, and L-Eval show that optimal strategies vary by task and evaluation setting, with evaluation metrics like ROUGE and BERTScore reflecting quality differently, and they release accompanying code.



