An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

Reddit r/artificial / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The author describes an attack class (“postural manipulation”) where ordinary, prior-context language can change how an LLM reasons before any explicit instruction is given.
  • They report reproducible binary decision reversals across four frontier models using matched controls, where the same question/task yields different answers depending on earlier conversation context.
  • The technique is framed as having no adversarial payload, no injection-like signature, and no obvious log trace, making it harder for current filtering approaches to detect.
  • For agentic workflows, the author warns that an early “posture” in one agent can persist through summarization and carry into downstream agents as seemingly independent expert judgment.
  • The disclosure was coordinated with major AI labs and security groups (Anthropic, OpenAI, Google, xAI, CERT/CC, OWASP), and demos are provided for testing against frontier models.

https://shapingrooms.com/research

I published a paper today on something I've been calling postural manipulation. The short version: ordinary language buried in prior context can shift how an AI reasons about a decision before any instruction arrives. No adversarial signature. Nothing that looks like an attack. The model does exactly what it's told, just from a different angle than intended.

I know that sounds like normal context sensitivity. It isn't, or at least the effect is much larger than expected. I ran matched controls and documented binary decision reversals across four frontier models. The same question, the same task, two different answers depending on what came before it in the conversation.

In agentic systems it compounds. A posture installed early in one agent can survive summarization and arrive at a downstream agent looking like independent expert judgment. No trace of where it came from.

The paper is published following coordinated disclosure to Anthropic, OpenAI, Google, xAI, CERT/CC, and OWASP. I don't have all the answers and I'm not claiming to. The methodology is observational, no internals access, limitations stated plainly. But the effect is real and reproducible and I think it matters.

If you want to try it yourself the demos are at https://shapingrooms.com/demos - works against any frontier model, no setup required.

Happy to discuss.

submitted by /u/lurkyloon
[link] [comments]