AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents
arXiv cs.AI / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a paired-trajectory protocol to evaluate tool-augmented LLM agents under clean versus contaminated tool-output conditions across seven models, revealing safety issues that standard metrics miss.
- Across models, recommendation quality is largely preserved under contamination (high utility preservation), while a large share of turns (65-93%) include risk-inappropriate products, exposing a systematic safety failure.
- Safety violations are predominantly information-channel-driven, emerge at the first contaminated turn, persist over 23-step trajectories, and agents do not self-check tool-data reliability.
- A safety-penalized NDCG variant (sNDCG) reduces utility preservation to 0.51-0.74, demonstrating that trajectory-level safety measurement can reveal evaluation gaps not captured by traditional ranking metrics.
Related Articles
Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document
Dev.to

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production
Dev.to
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to