CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
arXiv cs.CL / 4/24/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CI-Work, a Contextual Integrity (CI)-grounded benchmark designed to test whether enterprise LLM agents can use internal context safely across multiple information-flow directions.
- In dense retrieval settings, evaluations of frontier models show frequent privacy failures, with violation rates reported between 15.8% and 50.9% and leakage sometimes reaching up to 26.7%.
- The study finds a counterintuitive deployment trade-off: models that deliver higher task utility tend to cause more privacy violations.
- The authors argue that simply scaling model size or adding more reasoning does not solve the leakage problem, especially given the large volume of enterprise data and realistic user behavior.
- They conclude that protecting enterprise workflows likely requires a shift from model-centric scaling to context-centric architectures.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
Your MCP server probably has too many tools
Dev.to
MCP Auth That Actually Works: OAuth for Remote Servers
Dev.to
GoDavaii's Day 5: When 22 Indian Languages Redefine 'Hard' in Health AI
Dev.to

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results
Reddit r/LocalLLaMA
Corea arresta a hombre por imagen IA falsa del lobo Neukgu: hasta 5 años
Dev.to