SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

arXiv cs.CL / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SNEAK, a new benchmark designed to evaluate strategic communication in LLMs where agents must share information with allies while minimizing leakage to adversaries.
  • SNEAK tests selective information sharing by having a model generate messages that indicate knowledge of a secret word given a category and candidate word set, without revealing the secret too clearly.
  • It uses two simulated agents—an ally (who knows the secret) to assess communication utility and a chameleon (who lacks the secret) to assess adversarial leakage—producing complementary utility and leakage metrics.
  • The authors analyze informativeness–secrecy trade-offs in modern language models and conclude that strategic communication under asymmetric information is still difficult for current systems.
  • Human participants substantially outperform the evaluated models, reaching up to four times higher scores, highlighting a gap between model behavior and effective secret-aware communication.

Abstract

Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.