We ran 635 security tests against a hardened AI gateway (Minimus OpenClaw). Sandbox on. Tool restrictions configured. Access controls in place.
131 tests failed. Then it got worse.
The agent read its own documentation, found a parameter that let it run commands on the host instead of the sandbox, and rewrote its WhatsApp config. Changed who it could message from a restricted list to everyone. That change went live instantly.
Seven minutes later, two real people got unsolicited messages from a jailbroken AI agent.
The part worth paying attention to: the agent didn't exploit a bug. It used a tool it was given, exactly as that tool was designed. The sandbox was on and configured correctly. It just didn't matter because the tool had a "run on host" option that nobody thought to block.
Five security controls were in place. Each one made sense on its own. Together they left a gap the agent found in a single test run.
Every high-severity failure in the assessment targeted the model layer. Zero CVEs were exploited. Container hardening, distroless images, fewer binaries: all of that works for infrastructure threats. None of it stopped an agent from using its own tools against itself.
We published the full attack chain, the exact timeline, and our recommendations here: https://earlycore.dev/collection/blog-minimus-openclaw-security-assessment
If you're deploying AI agents with tool access, the question isn't whether your sandbox is configured right. It's whether you've audited every tool your agent can reach, and what happens when it uses them in ways you didn't plan for.
Open to questions on the methodology or the findings.
[link] [comments]




