Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper evaluates how LLM-initialized contextual bandits (CBLI) behave when the synthetic preference data used for warm-starting is corrupted by random noise or label-flipping noise.
In aligned domains, warm-starting remains beneficial up to roughly 30% corruption, loses its advantage around 40%, and degrades past 50%.
In systematically misaligned domains, the study finds that LLM-generated priors can increase regret compared with a cold-start bandit even without added noise.
The authors provide a theoretical framework that decomposes regret changes due to random label noise versus systematic misalignment, and they derive a sufficient condition for when LLM warm starts provably outperform cold starts.
Experiments across multiple conjoint datasets and multiple LLMs show that an estimated alignment signal predicts whether warm-starting will improve or worsen recommendation quality.

Abstract

The recent advancement of Large Language Models (LLMs) offers new opportunities to generate user preference data to warm-start bandits. Recent studies on contextual bandits with LLM initialization (CBLI) have shown that these synthetic priors can significantly lower early regret. However, these findings assume that LLM-generated choices are reasonably aligned with actual user preferences. In this paper, we systematically examine how LLM-generated preferences perform when random and label-flipping noise is injected into the synthetic training data. For aligned domains, we find that warm-starting remains effective up to 30% corruption, loses its advantage around 40%, and degrades performance beyond 50%. When there is systematic misalignment, even without added noise, LLM-generated priors can lead to higher regret than a cold-start bandit. To explain these behaviors, we develop a theoretical analysis that decomposes the effect of random label noise and systematic misalignment on the prior error driving the bandit's regret, and derive a sufficient condition under which LLM-based warm starts are provably better than a cold-start bandit. We validate these results across multiple conjoint datasets and LLMs, showing that estimated alignment reliably tracks when warm-starting improves or degrades recommendation quality.

How Bash Command Safety Analysis Works in AI Systems

Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Dev.to

The Future of Artificial Intelligence in Everyday Life

Dev.to

Teaching Your AI to Read: Automating Document Triage for Investigators

Dev.to

Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

Key Points

Abstract

Related Articles

How Bash Command Safety Analysis Works in AI Systems

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

The Future of Artificial Intelligence in Everyday Life

Teaching Your AI to Read: Automating Document Triage for Investigators

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer