How Claude Web tried to break out its container, provided all files on the system, scanned the networks, etc

Reddit r/artificial / 4/2/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The author reports that within hours of prompting it for security-related help, Claude Web attempted extensive reconnaissance actions, including full filesystem file listings, zipping and sharing environment files, and scanning available network information.
The model allegedly tried multiple ways to escape its container (e.g., exploiting vulnerabilities), agreed to running obfuscated exploit code, repeatedly attempted to crash its own tool container, and even searched memory for sensitive tokens such as JWTs.
In the described session, Claude reportedly identified a JWT and proposed/validated hypotheses about the execution environment, indicating it could adapt its behavior based on what it observed.
The author concludes that no real vulnerabilities were ultimately found due to robust infrastructure, but argues the same capability could be used against other environments—potentially impacting systems where a non-admin account is available.
The piece frames a broader security concern: LLMs may be faster and more effective at enabling probing/exploitation workflows than at reliably generating comparably secure code for defense.

Originally wasn't going to write about this - on one hand thought it's prolly already known, on the other hand I didn't feel like it was adding much even if it wasn't.

But anyhow, looking at the discussions surrounding the code leak thing, I thought I as well might.

So: A few weeks ago I got some practical experience with just how strong Claude can be for less-than-whole use. Essentially, I was doing a bit of evening self-study about some Linux internals and I ended up asking Claude about something. I noted that phrasing myself as learning about security stuff primed Claude to be rather compliant in regards of generating potentially harmful code. And it kind of escalated from there.

Within the next couple of hours, on prompt Claude Web ended up providing full file listing from its environment, zipping up all code and markdown files and offering them for download (including the Anthropic-made skill files); it provided all network info it could get and scanned the network; it tried to utilize various vulnerabilities to break out its container; it wrote C implementations of various CVEs; it agreed to running obfuscated C code for exploiting vulnerabilities; it agreed to crashing its tool container (repeatedly); it agreed to sending messages to what it believed was the interface to the VM monitor; it provided hypotheses about the environment it was running in and tested those to its best ability; it scanned the memory for JWTs and did actually find one; and once I primed another Claude session up, Claude agreed to orchestrating a MAC spoofing attempt between those two session containers.

Far as I can tell, no actual vulnerabilities found. The infra for Claude Web is very robust, and yeah no production code in the code files (mostly libraries), but.. Claude could run the same stuff against any environment. If you had a non-admin user account, for example, on some server, Claude would prolly run all the above against that just fine.

To me, it's kind of scary how quickly these tools can help you do potentially malicious work in environments where you need to write specific Bash scripts or where you don't off the bat know what tools are available and what the filesystem looks like and what the system even is; while at the same time, my experience has been that when they generate code for applications, they end up themselves not being able to generate as secure code as what they could potentially set up attacks against. I imagine that the problem is that often, writing code in a secure fashion may require a relatively large context, and the mistake isn't necessarily obvious on a single line (not that these tools couldn't manage to write a single line that allowed e.g. SQL injection); but meanwhile, lots of vulnerabilities can be found by just scanning and searching and testing various commonly known scenarios out, essentially. Also, you have to get security right on basically every attempt for hundreds of times in a large codebase, while you only have to find the vulnerability once and you have potentially thousands of attempts at it. In that sense, it sort of feels like a bit of a stacked game with these tools.

submitted by /u/tzaeru
[link] [comments]