ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Reddit r/MachineLearning / 4/7/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes ParetoBandit, a budget-paced adaptive routing method aimed at improving LLM serving under non-stationary request patterns.
It uses a bandit-style decision process to route traffic dynamically while accounting for latency/cost budgets during inference.
The approach is designed to remain effective when the underlying demand distribution changes over time, addressing a key real-world deployment challenge.
The work frames routing as an online optimization problem, balancing service quality with constrained compute or spend.
The article is a research submission highlighting the method and its positioning for non-stationary LLM traffic management rather than an immediate product release.