ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Reddit r/MachineLearning / 4/7/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes ParetoBandit, a budget-paced adaptive routing method aimed at improving LLM serving under non-stationary request patterns.
  • It uses a bandit-style decision process to route traffic dynamically while accounting for latency/cost budgets during inference.
  • The approach is designed to remain effective when the underlying demand distribution changes over time, addressing a key real-world deployment challenge.
  • The work frames routing as an online optimization problem, balancing service quality with constrained compute or spend.
  • The article is a research submission highlighting the method and its positioning for non-stationary LLM traffic management rather than an immediate product release.