Most people think validating shell commands is simple. Scan for rm, block eval, done. It's not even close.
This is a clean-room technical reconstruction of how an AI-assisted system can evaluate the safety of bash commands before execution. Everything here is based on externally observable behavior, publicly available technical patterns, and general shell semantics. No proprietary source code or internal materials were accessed.
All mechanisms described are conceptual reconstructions — how such a system can be designed, not documentation of any specific implementation.
The core problem
At first glance, validating shell commands appears simple. Scan for dangerous patterns — rm, eval, ;, pipes, redirects — and block them.
This approach fails immediately.
Consider:
echo "safe" ; rm -rf /
Versus:
echo "safe ; rm -rf /"
A superficial parser cannot distinguish whether ; is a command separator or part of a quoted string.
Shell syntax includes quoting rules, variable expansion, command substitution, arithmetic evaluation, brace expansion, and process substitution. Any of these can transform harmless-looking text into dangerous execution.
A reliable system must understand the structure of commands, not just their text.
Design principle: fail closed
A robust analyzer follows a strict rule: if a command cannot be fully understood, it must not be automatically approved.
This leads to an allowlist-based design. Only known-safe constructs are accepted. Everything else is treated as "too complex" and requires user confirmation.
This avoids the fundamental weakness of blocklists — where every new attack vector is an automatic bypass. With an allowlist, every new construct triggers a prompt instead.
The multi-layer pipeline
A well-designed system runs commands through a pipeline of defensive layers, each addressing a different class of failure:
- Pre-parse validation
- Structured parsing (AST)
- Allowlist-based traversal
- Variable scope tracking
- Controlled placeholder system
- Semantic validation
- Path and filesystem checks
- Policy enforcement
Let's walk through each one.
Layer 1: pre-parse validation
Before parsing, the raw command string is inspected for patterns that create ambiguity between what a parser sees and what the shell executes.
Control characters. Hidden bytes can alter how text is interpreted. A null byte, a backspace sequence, or an ANSI escape code can make a command look different in a terminal than it does to a parser.
Invisible Unicode. Characters like zero-width space can visually disguise commands. What looks like ls might actually contain invisible characters that change execution.
Backslash line continuation. This is subtle:
tr\
aceroute
Appears as two tokens but executes as traceroute. A parser that doesn't handle continuation will see something different from the shell.
Shell-specific extensions. Features from zsh may not match bash parsing rules. If the analyzer assumes bash semantics, zsh-specific syntax creates ambiguity.
Brace obfuscation. Complex quoting inside {} can mislead simple parsers.
The goal of this layer: eliminate inputs where different interpreters would disagree on what the command means.
Layer 2: structured parsing
Instead of regex, the system builds a syntax tree — an AST.
This lets it separate commands, identify arguments, and track structure without execution. But parsing alone isn't sufficient. The parser must never execute the command it's analyzing.
Resource limits matter here too. Maximum input size, maximum parse complexity, strict time limits. If any are exceeded, the command is marked as too complex. This prevents adversarial inputs designed to hang the parser.
Layer 3: allowlist AST traversal
After parsing, the system walks the syntax tree. The critical rule: only explicitly supported node types are allowed.
Supported constructs include simple commands, pipelines, conditionals, and variable assignments. Anything the walker doesn't recognize — any unknown node type — is immediately classified as too complex.
This is the most important design decision in the entire pipeline. It means the system doesn't need to enumerate every dangerous pattern. It only needs to enumerate safe ones.
Layer 4: variable scope tracking
Shell behavior depends heavily on execution order, and this is where naive analyzers break.
true || FLAG=--safe && cmd $FLAG
A naive system might assume $FLAG is always set. In reality, FLAG may never be assigned depending on how || and && chain.
The analyzer models branching (&&, ||), subshells (()), and pipelines. Variables are tracked with correct execution semantics — if a variable might not be defined in all branches, it's treated as unknown.
Layer 5: the placeholder system
Some constructs can't be resolved statically.
echo "commit $(git rev-parse HEAD)"
Instead of rejecting the entire command, the outer command is preserved and the inner command is extracted and analyzed separately.
Placeholders like __CMDSUB_OUTPUT__ (for command substitutions) and __TRACKED_VAR__ (for unknown variables) let the analyzer reason about the structure of a command without needing to know the runtime values.
But there's a critical constraint — bare variable risk:
VAR="-rf /"
rm $VAR
Shell expands this into rm -rf /. To prevent this, variables containing whitespace or glob patterns are rejected unless quoted. The quoting distinction between $VAR and "$VAR" is load-bearing.
Layer 6: semantic validation
Even syntactically valid, structurally understood commands can be dangerous.
Eval-like behavior. Commands that execute strings as code — eval, source, exec — are inherently unsafe because their behavior depends on runtime values the analyzer can't see.
Indirect execution. Traps, dynamic loading, and subshell triggers can execute code as a side effect of seemingly safe operations.
Embedded execution in tools. This is the sneaky one:
jq 'system("rm -rf /")'
The outer command is jq. The inner payload is arbitrary shell execution. Any tool with its own expression language — awk, perl, jq, find -exec — can be a vector.
Subscript evaluation. Some shell expressions trigger execution during evaluation:
test -v 'a[$(cmd)]'
The array subscript gets evaluated, which runs cmd. This is a real bash behavior that most people don't know about.
Layer 7: filesystem and path safety
Commands are categorized by their filesystem impact: read, write, or destructive.
Certain paths are always sensitive — /etc, /usr, /bin, /proc — and require explicit approval regardless of the command.
Special cases matter here. The -- delimiter (rm -- -file) must not be misinterpreted as flags. Process substitution (>(command)) can hide side effects. Directory changes followed by writes (cd dir && write file) create ambiguity without execution tracking.
Layer 8: policy enforcement
After all analysis, commands are evaluated against configurable rules with three possible outcomes:
- Allow — safe and fully understood
- Ask — unclear, complex, or borderline
- Deny — explicitly forbidden
Rules can match exactly, by prefix, or by pattern. The system also strips wrappers like timeout and env to analyze the underlying command, detects compound commands, and performs cross-segment analysis — because even if cd dir and git status are individually safe, their combination may not be.
The key insight
This system doesn't attempt to prove commands are safe. It answers a narrower question:
"Can we fully understand this command with high confidence?"
If the answer is no, it asks the user.
| Layer | Purpose |
|---|---|
| Pre-checks | Remove ambiguity |
| Parsing | Understand structure |
| AST allowlist | Reject unknown constructs |
| Scope tracking | Preserve execution semantics |
| Placeholders | Handle dynamic behavior |
| Semantics | Detect dangerous intent |
| Path validation | Protect filesystem |
| Rules | Enforce policy |
Eight layers. Each one catches things the others miss. The design isn't clever — it's thorough. And the default answer to uncertainty is always the same: don't execute. Ask.
That's a broader principle worth remembering. Uncertainty should never default to execution.
Follow me on X — I post as @oldeucryptoboi.




