AI Navigate

Targeted Bit-Flip Attacks on LLM-Based Agents

arXiv cs.AI / 3/12/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The authors present Flip-Agent, a targeted bit-flip attack framework specifically designed for LLM-based agents in multi-stage pipelines.
  • Flip-Agent can manipulate not only the final outputs but also the sequence of tool invocations that an agent performs.
  • Experimental results show Flip-Agent outperforms prior BFAs on real-world agent tasks, indicating a stronger attack surface than previously known.
  • The work exposes a critical security vulnerability in LLM-based agent systems and calls for improved fault-tolerance and defense strategies.

Abstract

Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.