Why I Reimplemented 22 Unix Tools in Go for AI Agents

Dev.to / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The author reimplemented 22 core Unix tools in Go to address a practical issue: AI coding agents struggle to reliably parse human-oriented command output.
  • They argue that agent workflows waste tokens and encounter subtle bugs when tools like `ls`, `grep`, and `find` emit column-based, locale-dependent, and ambiguous text that models must guess.
  • The proposed solution is to return structured, labeled, machine-readable data (e.g., XML/fields for filename, size, timestamps, type, and binary status) instead of pretty terminal formatting.
  • The article frames this as an interface change for AI agents: the goal is accurate data extraction with low fragility rather than human readability.

I spent three weeks rebuilding ls, grep, cat, find, stat, diff, and 16 other Unix coreutils in Go. Not because the originals are broken — they're masterpieces of systems programming that have survived decades of use. I rebuilt them because AI coding agents are terrible at reading their output.

The Problem Nobody Talked About

Every time an AI agent runs ls src/, it receives something like this:

-rw-r--r--  1 user  staff  2048 Apr  6 12:00 main.go
drwxr-xr-x  3 user  staff    96 Apr  6 11:00 internal
lrwxr-xr-x  1 user  staff    12 Apr  6 10:00 link -> main.go

The agent has to figure out which column is the filename. Which is the size. Whether that d at the start means directory. Whether Apr 6 means this year or last year. It guesses. Sometimes it guesses wrong. And every wrong guess costs tokens, introduces errors, and degrades the quality of the code it writes.

Now multiply that by every grep, every cat, every find the agent runs in a single session. The token waste is staggering. The parsing fragility is a constant source of subtle bugs.

The Insight

AI agents don't need pretty terminal output. They need structured data. They need to know that main.go is 2048 bytes, was modified 3600 seconds ago, is written in Go, has MIME type text/x-go, and is not a binary file. They need this information labeled, unambiguous, and machine-readable.

So I asked: what if ls returned XML?

<ls timestamp="1712404800" total_entries="3">
  <file name="main.go" path="src/main.go" absolute="/project/src/main.go"
        size_bytes="2048" size_human="2.0 KiB"
        modified="1712404800" modified_ago_s="3600"
        language="go" mime="text/x-go" binary="false"/>
  <directory name="internal" path="src/internal"/>
  <symlink name="link" target="main.go" broken="false"/>
</ls>

Zero ambiguity. Zero parsing. The agent reads the attributes and knows exactly what it's looking at. No regex to extract filenames from column-aligned text. No heuristic to determine if something is a directory. No guesswork.

Why XML and Not JSON?

Good question. JSON is the lingua franca of APIs. But XML has a structural advantage that matters for AI context windows: attributes.

Compare these two representations of the same file:

<file size_bytes="2048" language="go" mime="text/x-go"/>
{"size_bytes": 2048, "language": "go", "mime": "text/x-go"}

The XML version is 40 characters. The JSON version is 60. That's a 33% difference. When you're listing 1,000 files, that's tens of thousands of tokens saved. AI context windows are expensive and limited. Every character counts.

That said, aict supports --json for every tool. The schema is identical. Use whatever your pipeline prefers.

Why Go?

Three reasons:

Single binary. Go compiles to a static binary with zero runtime dependencies. aict is one file you drop on a system and it works. No pip install, no npm install, no shared libraries to manage. For a tool that's supposed to replace coreutils, this is non-negotiable.

Standard library only. Every feature — regex matching, MIME detection, filesystem walking, XML encoding — uses Go's standard library. Zero external dependencies means zero supply chain risk, zero version conflicts, and the ability to audit the entire codebase in an afternoon.

Performance is good enough. Yes, aict grep is slower than ripgrep. Yes, aict ls is slower than eza. But we're talking 15ms vs 2ms for listing 1,000 files. The overhead comes from language detection, MIME sniffing, and structured output — features that are the entire point of the project. For normal codebases, the difference is imperceptible.

What I Built

Twenty-two tools across five categories:

  • File inspection: cat, head, tail, file, stat, wc
  • Directory & search: ls, find, grep, diff
  • Path utilities: realpath, basename, dirname, pwd
  • Text processing: sort, uniq, cut, tr
  • System & environment: env, system, ps, df, du, checksums

Plus a git subcommand suite (status, diff, log, ls-files, blame) and an MCP server that exposes every tool as a callable function to AI assistants like Claude and Cursor.

Every tool supports three output modes: XML, JSON, and plain text. Every error is structured data in stdout, never stderr. Every path is absolute. Every timestamp is a Unix epoch integer.

The MCP Server

This is where it gets interesting. aict ships with an MCP (Model Context Protocol) server binary called aict-mcp. You configure it in Claude Desktop or Cursor, and suddenly every tool becomes a typed, callable function.

The AI agent doesn't shell out to run aict ls src/. It calls the ls function with {path: "src/"} and receives structured JSON. No shell spawning. No output parsing. No ambiguity.

This is the future of how AI agents interact with filesystems. Not by typing commands into a terminal and reading the output like a human would. By calling typed functions and receiving typed responses.

What I Didn't Build

I intentionally excluded write operations: cp, mv, rm, mkdir, chmod, chown. These are dangerous when called by AI agents without human confirmation. aict is a read-only tool. It observes, it doesn't modify.

I also didn't try to match GNU coreutils flag-for-flag. Where a flag made sense for the AI use case, I added it. Where it didn't, I skipped it. The goal is not compatibility — it's utility for AI agents.

The Honest Benchmark

I benchmarked aict against GNU coreutils. Here are the results:

Tool GNU aict Ratio
ls (1,000 files) ~2ms ~15ms
grep (100k lines) ~1ms ~100ms 100×
find (deep tree) ~2ms ~9ms
cat (100k lines) ~1ms ~23ms 17×
diff (1,000 lines) ~1ms ~10ms 10×

grep and cat are slow because every file is MIME-typed and language-detected. Use --plain to skip enrichment when you only need content. The trade-off is intentional: more tokens spent on parsing vs. more semantic information returned.

Is This Actually Useful?

I've been using aict with Claude and Cursor for two months. The difference is noticeable. The agent makes fewer mistakes about file types. It doesn't confuse directories with files. It correctly identifies binary files before trying to read them. It understands the structure of a codebase faster.

The token savings are real. A directory listing that used to cost 2,000 tokens in plain text now costs 800 in XML with three times the information density. Over a typical coding session with dozens of tool calls, that adds up.

Open Source

The project is MIT licensed and on GitHub. It's written in Go with zero external dependencies. You can audit the entire codebase in an afternoon. I'd love contributions — new tools, performance improvements, bug fixes.

If you build AI agents that interact with codebases, give it a try. Your agent will thank you. And if it doesn't work for your use case, that's fine too. GNU coreutils aren't going anywhere.

The repo is at github.com/synseqack/aict.

Why I Reimplemented 22 Unix Tools in Go for AI Agents | AI Navigate