The Billion Dollar Tax on AI Agents

Dev.to / 3/30/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The article estimates that AI agents collectively spend roughly $1–$5 billion per year processing web-page HTML/JavaScript tokens that add little or no useful semantic value for the agents.
  • It argues the waste is driven not by model inefficiency but by the web’s human-focused presentation layer (CSS/layout scaffolding, tracking scripts, ads, cookie dialogs, SVG assets) that agents still tokenize and read.
  • Using a 50-site benchmark and the HTTP Archive’s 2025 Web Almanac as context, the author reports about 33,000 tokens per rendered page for LLM tokenizers versus about 8,300 tokens when converted to a semantic representation (SOM), implying ~75% is non-semantic overhead.
  • The core takeaway is a call for more attention to reducing “HTML markup tax” costs via better content formats, agent-aware web delivery, or semantic extraction approaches.
  • The article frames the total cost as a scaling problem: multiplying per-page token waste by the (estimated) number of daily agent page fetches, which major AI companies do not fully disclose.

I want to walk you through a number that I keep coming back to, one that I think deserves more attention than it gets. The number is somewhere between one billion and five billion dollars per year. That is our estimate of how much the AI industry spends, collectively, processing web page markup that no agent will ever use.

Not because the agents are inefficient. Not because the models are wasteful. Because the web serves content in a format designed for human eyes, and agents are paying the full cost of that visual presentation layer every time they read a page.

Let me show you where this number comes from.

Start with a single web page

Pick any web page. Go to your favorite news site, an e-commerce product page, a documentation site, a government portal. Right-click, view source. What you see is HTML: a mix of content (the text, the links, the headings) and presentation (CSS classes, inline styles, layout containers, tracking scripts, ad markup, SVG icons, data attributes).

The HTTP Archive's 2025 Web Almanac reports that the median home page now weighs 2.86 MB on desktop and 2.56 MB on mobile. But what matters for AI agents is not the total page weight (which includes images, fonts, and videos that agents do not load). What matters is the HTML document itself and the JavaScript that populates it.

Across our 50-site benchmark, the average rendered web page contains about 33,000 tokens when fed to a language model tokenizer. That is the number the agent actually processes: 33,000 tokens of HTML markup shoved into the model's context window.

How much of that is actual content? We measured this by comparing raw HTML to SOM (a structured representation that preserves semantic content while stripping presentation). The SOM version of the same pages averages about 8,300 tokens.

That means roughly 24,900 tokens per page, about 75%, encode nothing that the agent needs. CSS class names like flex items-center justify-between px-4 py-2 bg-white dark:bg-gray-900. Tracking scripts. Layout dividers. Ad containers. Cookie consent dialogs. SVG path data for icons. The visual scaffolding that makes a page look right in a browser but contributes nothing to an agent's understanding of what the page says or what you can do on it.
The agent processes all of it. Every token. And somebody pays for every token.

Scale it up

Now take that per-page waste and multiply it by the number of pages AI agents browse every day. This is where the math gets interesting and, I will be honest, where it requires some estimation. The exact number of daily agent page fetches is not publicly disclosed by any major AI company. But we can build a reasonable model from what is public.

Cloudflare's 2025 Year in Review reports that approximately 30% of all web traffic is bot traffic. AI-specific crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent, Amazonbot, and others) account for about 4.2% of all HTML request traffic, separate from Googlebot's 4.5%. And this is growing fast: from May 2024 to May 2025, AI crawler traffic grew 18% overall, with GPTBot specifically growing 305% in that period.

But raw crawler traffic (which is mostly training data collection) is different from what agents do when they browse on behalf of users. We need to separate the two.

When you ask ChatGPT to look something up and it browses the web, that is a user-action page fetch. When GPTBot crawls Wikipedia at 3 AM to build a training dataset, that is training crawl traffic. The training traffic is much larger in volume, but the user-action traffic is what incurs per-request LLM inference costs, because the fetched page goes directly into a model's context window.

Using public user count data (OpenAI has reported over 300 million weekly active users for ChatGPT, Perplexity has disclosed 20 million monthly users, Anthropic has not disclosed but is estimated at several million Claude users), we built a bottom-up model of daily page fetches. The estimate: roughly 400 million user-action page fetches per day, across all major AI agents combined.

400 million pages. 24,900 wasted tokens each. At a weighted average API price of $0.75 per million input tokens (blending GPT-4o at $2.50, GPT-4o Mini at $0.15, Claude Sonnet at $3.00, Gemini at $1.25).

The math: 400M pages/day x 24,900 waste tokens x $0.75/M tokens x 365 days = approximately $2.7 billion per year.

A separate top-down model calibrated against Cloudflare's total traffic volume produces a higher estimate. Combining the two approaches, we bracket the annual industry-wide token waste at $1 billion to $5 billion per year.

400M pages/day × 24,900 waste tokens × $0.75/M tokens × 365 days = $2.7 billion/year. And this only counts user-triggered browsing, not autonomous agent workloads.


That is a lot of money. Is it real?

It is worth being explicit about the uncertainties here. The exact number depends heavily on three variables: how many pages agents fetch per day (our biggest source of uncertainty), the effective LLM price per token (which is falling and varies by model), and how much preprocessing agents already do (some strip HTML to markdown, some truncate, some cache).

We account for all of these. Our model assumes that 45% of agents already convert to markdown (which reduces waste by about 70%), 30% truncate HTML (reducing waste by about 30%), and 15% of fetches are cache hits. After these adjustments, the effective waste per page drops from 24,900 tokens to about 13,300 tokens. The billion-dollar figure already includes these reductions.

You could argue the number is lower if agent usage grows slower than projected, or if LLM prices continue to drop. Both are plausible. But agent usage is growing much faster than prices are dropping. GPTBot's traffic grew 305% in a single year. LLM input prices have not fallen 305% in the same period. The total cost is going up, not down.

You could also argue the number is higher, because our model only counts user-action fetches (where someone explicitly asks an agent to browse). It does not count autonomous agent workloads: monitoring services, price comparison engines, research pipelines, and other machine-to-machine browsing that runs continuously without human prompting. Those workloads are growing rapidly and process far more pages per instance than a human user would.

The honest answer: we believe $1B to $5B is a reasonable bracket. The central estimate of $2.7B probably understates autonomous workloads and overstates the effectiveness of current preprocessing.

Where does the money actually go?

This is the part that I find most frustrating. The wasted tokens do not simply vanish. They consume real resources at every stage of the inference pipeline.

GPU compute. Every input token passes through the model's attention layers. The self-attention mechanism has quadratic complexity with respect to input length: doubling the input more than doubles the compute. When 75% of the input is presentation noise, the model spends the majority of its attention budget on tokens that carry no useful information. This is not just a billing abstraction. It is actual electricity consumed by actual GPUs running actual matrix multiplications on CSS class names.

Context window displacement. Language models have finite context windows. A 128K-token window sounds generous until you realize that a single HTML page consumes 33K of it. An agent that needs to analyze five pages in a single pass can only fit one or two in HTML, but could fit all five in a structured format. The wasted tokens directly limit what the agent can reason about in a single inference call.

Latency. More input tokens mean longer time-to-first-token. In our WebTaskBench evaluation, we measured the latency impact directly. On Claude Sonnet 4, the average task took 16.2 seconds with raw HTML input versus 8.5 seconds with SOM input. The agent was nearly twice as fast, simply because it had less noise to process. GPT-4o showed a similar pattern: 2.74 seconds with HTML versus 1.44 seconds with SOM.

That latency difference is not just about user experience (though users do notice when their AI assistant takes 16 seconds instead of 8). It is about throughput. A serving cluster that can handle N requests per second with HTML input can handle roughly 2N requests per second with structured input. The infrastructure savings compound on top of the token cost savings.

What nobody talks about: the crawl-to-click gap

There is a dimension to this problem that goes beyond token costs, and I think it is the more important one for the long-term health of the web.

Cloudflare published a remarkable dataset in August 2025 titled "The crawl-to-click gap." The core finding: AI crawlers consume vastly more content than they send back as referral traffic. The ratios are staggering.

In July 2025, Anthropic's crawlers fetched approximately 38,000 pages for every single page visit they referred back to a publisher. That is a 38,000:1 crawl-to-refer ratio. Earlier in the year it was 286,000:1. Perplexity's ratio actually got worse over 2025, with more crawling but fewer referrals, reaching 194:1 by July.

Compare this to Google. For all the complaints about Google hoarding traffic (and the data does show Google referrals to news sites declining since February 2025, coinciding with the expansion of AI Overviews), Google's crawl-to-refer ratio is in the single digits. It crawls pages and sends users back.

AI companies crawl pages and keep the value.
This is the economic context that makes the token waste problem more than an efficiency issue. Publishers are paying to serve content to agents that extract the value and return nothing. The infrastructure costs of serving those requests (bandwidth, compute, CDN, origin rendering) come out of the publisher's budget. And the content served in those requests is 75% visual presentation that the agent throws away.

The publisher pays to generate it. The agent pays to process it. And neither party gets value from it.

What the ten biggest agent frameworks actually do

Part of our research involved surveying the default web content handling in 10 major agent frameworks. The results were, frankly, depressing.

LangChain, LlamaIndex, and CrewAI, the three most popular agent orchestration frameworks, all default to BeautifulSoup's get_text() method. This is the most aggressive possible extraction: it strips every HTML tag and returns flat, unstructured text. The result is small (good for tokens) but has lost all structural information: element types, interactive affordances, page regions, everything that distinguishes a button from a heading from a link.

Dedicated scraping tools like Crawl4AI, Firecrawl, and Jina Reader use markdown extraction, which is more sophisticated. Markdown preserves headings, links, and basic formatting. But it still discards element types (a button looks the same as a link), interactive affordances (you cannot tell what you can click), and page regions (main content is indistinguishable from sidebar content).

Browser Use and Stagehand use accessibility tree extraction, which is the closest to a structured representation. But accessibility trees are designed for screen readers, not AI agents. They include every element on the page (including the 200 decorative ARIA landmarks in a typical site footer) and produce output that is often as verbose as the original HTML.

None of the ten frameworks we surveyed use a structured semantic representation by default. Zero out of ten.

The entire ecosystem is either stripping web pages down to bare text (losing structure) or passing through raw HTML (paying for noise). There is no middle ground in production use today.


LangChain, LlamaIndex, CrewAI: BeautifulSoup get_text(). Strips all HTML, returns flat text. Minimal tokens, zero structure.


Crawl4AI: Custom HTML-to-Markdown. Preserves headings and links, loses element types and affordances.


Firecrawl: Readability + Markdown. Good for article extraction, blind to interactive elements.


Jina Reader: Custom extraction to Markdown. Similar tradeoffs to Firecrawl.


AutoGPT: Delegates to Jina/Firecrawl. Inherits their limitations.


Browser Use: Accessibility tree + DOM. Closest to structured, but designed for screen readers, not agents.


Stagehand: Accessibility tree. Same verbose output issue as Browser Use.


The number that keeps me up at night

Here is the calculation that I keep returning to. Take the WebTaskBench data: SOM uses 8,301 tokens per page on average. Raw HTML uses 33,181. The difference is 24,880 tokens.

Multiply by 400 million pages per day. That is 9.95 trillion wasted tokens per day. Over a year, approximately 3.6 quadrillion tokens. At $0.75 per million tokens, that is $2.7 billion.

But here is what keeps me up: that 400 million daily page fetch number is from our conservative bottom-up model, which only counts explicit user-triggered browsing. The top-down model calibrated against Cloudflare's total AI bot traffic suggests the real number could be 3 to 4x higher when you include autonomous agent workloads.

And agent traffic is growing at 18 to 30% year over year, while LLM prices are dropping maybe 30 to 50% per generation (roughly every 6 to 12 months). The volume growth is outpacing the price decline. The total cost curve is going up.

If current trends hold, by 2027 the annual waste could exceed $10 billion. Not because anyone is being negligent, but because the fundamental mismatch between the format the web serves (visual HTML) and the format agents need (structured semantic content) will become more expensive with every page added to the web and every new agent deployed.

This is a solvable problem

I want to be clear about something: this is not a doom-and-gloom piece. The waste is real, the numbers are large, but the problem is entirely solvable with existing technology.

The web has solved this exact problem before, for other consumer classes. When search engines emerged as a new web consumer in the late 1990s, they struggled with HTML too. The web responded by inventing sitemaps, robots.txt, and structured data (Schema.org, JSON-LD, OpenGraph). These machine-readable layers sit alongside the human-readable HTML and provide crawlers with the structured information they need without requiring them to parse visual markup.

When applications emerged as a web consumer in the mid-2000s, the web responded again: REST APIs, GraphQL, webhooks. Purpose-built interfaces for programmatic consumption.

AI agents are the fourth consumer of the web. They need the same thing: a purpose-built representation designed for their consumption model. Not raw HTML (too noisy), not plain text (too lossy), but something in between that preserves what agents need and discards what they do not.

That is what we are building with Plasmate and the Semantic Object Model. But honestly, the specific technology matters less than the recognition that the problem exists and is getting more expensive every day. If someone builds a better solution than SOM, great. The industry still saves billions.

The full analysis, including the complete estimation methodology, sensitivity analysis across pricing scenarios, and the framework survey data, is published as a research paper on this site. If you work on agent infrastructure, pricing models, or web content delivery, I think you will find the data useful.

What I really want this piece to leave you with is simpler than the math:
That tax adds up to billions. It does not have to.
David Hurley is the founder of Plasmate Labs. Previously, he founded Mautic, the world's first open source marketing automation platform. He writes at dbhurley.com/blog and publishes research at dbhurley.com/papers.