ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering

arXiv cs.CL / 3/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ToolFlood is a retrieval-layer attack on tool-augmented LLM agents that overwhelms the top-k retrieval by injecting attacker-controlled tools whose metadata are strategically placed in embedding space.
It employs a two-phase strategy: first generating diverse attacker tool names and descriptions with an LLM, then greedily selecting tools to maximize coverage of target queries under a cosine-distance threshold.
The study reports up to a 95% attack success rate with a low injection rate (1%) on ToolBench, highlighting a significant vulnerability in the retrieval stage of tool-augmented LLMs.
The authors indicate that the code will be publicly released, enabling replication and further research on defenses against semantic-covering attacks.

Abstract

Large Language Model (LLM) agents increasingly use external tools for complex tasks and rely on embedding-based retrieval to select a small top-k subset for reasoning. As these systems scale, the robustness of this retrieval stage is underexplored, even though prior work has examined attacks on tool selection. This paper introduces ToolFlood, a retrieval-layer attack on tool-augmented LLM agents. Rather than altering which tool is chosen after retrieval, ToolFlood overwhelms retrieval itself by injecting a few attacker-controlled tools whose metadata is carefully placed by exploiting the geometry of embedding space. These tools semantically span many user queries, dominate the top-k results, and push all benign tools out of the agent's context. ToolFlood uses a two-phase adversarial tool generation strategy. It first samples subsets of target queries and uses an LLM to iteratively generate diverse tool names and descriptions. It then runs an iterative greedy selection that chooses tools maximizing coverage of remaining queries in embedding space under a cosine-distance threshold, stopping when all queries are covered or a budget is reached. We provide theoretical analysis of retrieval saturation and show on standard benchmarks that ToolFlood achieves up to a 95% attack success rate with a low injection rate (1% in ToolBench). The code will be made publicly available at the following link: https://github.com/as1-prog/ToolFlood

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/17DailyView insight →

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

Waymo hits 170 million miles while avoiding serious mayhem

The Verge

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering

Key Points

Abstract

💡 Insights using this article

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

Waymo hits 170 million miles while avoiding serious mayhem

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer