Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

Dev.to / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The article argues that using an LLM to parse and extract IDs directly from large, messy raw log JSON wastes money and compute because the model is forced to act like a fragile parser.
  • It describes a failure mode where malformed escaping and HTML in activity logs caused the agent to hallucinate structure and loop recursively, consuming about 50,000 tokens to extract a single value.
  • The author introduces the idea of a “Hype Tax,” highlighting telemetry-driven measurement showing how cleaning/extracting into structured JSON drastically reduces tokens, cost per call, and latency while improving success rates.
  • The proposed principle is a two-stage agent pipeline: do deterministic extraction locally (e.g., regex/JSON formatting) and reserve the LLM for higher-level reasoning over cleaned inputs, not for noisy data parsing.
  • Overall, the piece frames proper AI integration as secure-by-design engineering that eliminates token overhead and non-determinism by keeping raw data processing deterministic.

Hey DEV community, CallmeMiho here. I spent my Monday morning watching a junior dev ship a Rube Goldberg machine powered by a credit card. Let's talk about why your AI agents are bankrupting you.

The task was simple: extract a specific itemUuid and a scimId from a massive dump of raw Activity Log data. Instead of engineering a solution, the dev just pipe-lined the raw, unformatted JSON slop—full of broken quotes, escaped characters, and mixed HTML tags—directly into a prompt and told the model to "find the ID."

The result was a textbook case of probabilistic vibration. Because the log was a disaster of escaped quotes (\") and malformed HTML, the agent hit an escaped character, hallucinated an end-of-file bracket that didn't exist, and entered an infinite recursion loop trying to re-parse the "rest" of the string.

It wasn't "reasoning"; it was a neural network tripping over its own feet because it was forced to be a parser. By the time I killed the process, the agent had burned through 50,000 tokens of high-tier compute just to find a single scimId.

That is not "AI Engineering"—it is technical debt with a monthly subscription.

The Hard Data: Measuring the Waste

If you aren't looking at the telemetry of your prompts, you aren't an architect; you’re a philanthropist for cloud providers. We call the delta between these two rows the Hype Tax.

Metric Raw Activity Log Data Cleaned/Extracted JSON
Input Volume 45,000 tokens 150 tokens
Cost Per Call $0.68 $0.002
Latency 15s 2s
Success Rate 70% 99.9%

Paying a model to navigate 45,000 tokens of "garbage" formatting is a total failure of basic engineering discipline. When you feed an LLM unextracted noise, you aren't just wasting money—you are intentionally introducing non-determinism into a process that should be binary.

The Philosophy: LLMs Are Not Parsers

If a local Regex script or a JSON formatter can extract the signal for free, paying a model to do it is architectural waste.

High-performance engineering is grounded in a Secure by Design approach. Just as modern zero-knowledge architectures perform cryptographic operations locally to eliminate server-side risk, a professional AI integration must perform data extraction locally to eliminate token overhead. You should not trust a cloud LLM with raw data extraction; that belongs in the deterministic layer.

In a professional stack, we distinguish between Deterministic Logic (Regex, Zod) and Probabilistic Logic (LLMs). You don't use a billion-dollar neural network to find a DeviceKey in a text string; you use a parser. The LLM should only see the specific, non-sensitive identifiers required for the task after the local environment has done the heavy lifting.

The Fix: The Stage 1 Deterministic Pipeline

The only professional way to build AI integrations is a Two-Stage Agent Pipeline. Stage 1 is a local, deterministic cleanup phase. Before a single token is sent to a cloud provider, the data must pass through a local pipeline that strips the noise.

If you don't have a local pipeline, use offline utilities to build one:

  • Regex Tester: Your first line of defense. Use a local script to pull the itemUuid from a URL before the LLM ever sees the log entry. If the model is reading <div> tags to find a UUID, you have already failed. Build your regex patterns here.
  • JSON Formatter: Standardize and minify. This prevents the model from "vibrating" on broken quotes or escaped characters. A minified schema ensures the model focuses on the values rather than the syntax of the log. Format and minify your JSON here.
  • Zod Schema Generator: Use this to enforce strict data contracts. Based on the Least Privilege principle, your Zod schema should programmatically strip fields before the prompt is assembled. If the model doesn't need to see it, Zod shouldn't pass it. Generate your Zod schemas here.

Conclusion: Engineering Discipline vs. Hype

High-performance engineering isn't about how much AI you use, but how little you need to use to get the job done. If you are sending unformatted log slop to a cloud model, you aren't building an agent; you are building a technical debt generator.

Stop subsidizing Sam Altman’s compute with your company’s technical debt. Clean your data or get out of the kitchen.

P.S. If you want to audit your token payloads before they bankrupt you, I built a suite of 100% offline, zero-server-log developer tools atFmtDev.dev.