Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model
arXiv cs.LG / 5/1/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that while Retrieval-Augmented Generation (RAG) improves LLM reliability, existing RAG-as-a-Service (RaaS) pricing and access models can be opaque and inefficient because they charge based on prompts rather than the relevance/quality of retrieved chunks.
- It proposes “Chunk-as-a-Service” (CaaS) as a more transparent and cost-effective alternative, offering two variants: Open-Budget CaaS (OB-CaaS) and Limited-Budget CaaS (LB-CaaS).
- For LB-CaaS and OB-CaaS, the authors introduce the Utility-Cost Online Selection Algorithm (UCOSA), which selectively enriches a subset of prompts online while respecting budget constraints and utility–cost tradeoffs.
- Experiments show UCOSA improves over offline and relevance-greedy baselines using a metric combining the number of enriched prompts and average relevance, and it substantially outperforms random selection.
- The results also indicate better budget efficiency versus RaaS, with CaaS variants achieving higher performance-to-budget ratios, demonstrating improved cost-effectiveness and accessibility for retrieval-enhanced generation.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Why Enterprise AI Pilots Fail
Dev.to

Automating FDA Compliance: AI for Specialty Food Producers
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to