When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies contextual multi-armed bandit problems with mixed textual and numerical context (such as recommendation, portfolio adjustments, and offer selection in finance), where LLMs are increasingly used for step-by-step reasoning but are costly and hard to calibrate for uncertainty.
  • It proposes LLMP-UCB, a bandit algorithm that extracts uncertainty from LLMs by running repeated inference to enable UCB-style exploration.
  • Experiments show that lightweight numerical bandits using text embeddings (including dense and Matryoshka embeddings) can match or outperform LLM-based approaches while dramatically reducing computation cost.
  • The authors introduce embedding dimensionality as a controllable lever to tune the exploration–exploitation tradeoff, enabling practical cost–performance tradeoffs without complex prompting.
  • They provide a geometric diagnostic using the arms’ embeddings to help practitioners decide when LLM-driven reasoning is truly warranted versus relying on lightweight bandits, aiming for uncertainty-aware, cost-effective deployment.

Abstract

We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at every decision step is computationally expensive and uncertainty estimates are difficult to obtain. To address this, we introduce LLMP-UCB, a bandit algorithm that derives uncertainty estimates from LLMs via repeated inference. However, our experiments demonstrate that lightweight numerical bandits operating on text embeddings (dense or Matryoshka) match or exceed the accuracy of LLM-based solutions at a fraction of their cost. We further show that embedding dimensionality is a practical lever on the exploration-exploitation balance, enabling cost--performance tradeoffs without prompt complexity. Finally, to guide practitioners, we propose a geometric diagnostic based on the arms' embedding to decide when to use LLM-driven reasoning versus a lightweight numerical bandit. Our results provide a principled deployment framework for cost-effective, uncertainty-aware decision systems with broad applicability across AI use cases in financial services.