The Markup

Dev.to / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisIndustry & Market Moves

Key Points

  • Per-token AI inference costs reportedly fell about 80% year over year, and benchmark tracking shows steep declines, but enterprises are spending far more overall, widening a “markup” gap that is not explained by model efficiency alone.
  • The article argues that companies don’t deploy models; they deploy end-to-end systems whose costs—hardware, land, energy, compliance, legal overhead, and operations—are rising for reasons largely unrelated to AI progress.
  • Semiconductor and data-center supply chains are shaped by policy, including tariffs and regulatory structures, illustrated by TSMC’s much higher Arizona fab costs versus Taiwan and the need for negotiated tariff exemptions.
  • Land prices for data centers have surged substantially in multiple U.S. markets, driven by zoning/rezoning and hyperscalers acquiring larger parcels, increasing the physical footprint costs of deploying AI.
  • GPU pricing and availability face additional import-cost pressures via tariffs (with cited increases), further contributing to the broader cost mismatch between cheap model computation and expensive deployment infrastructure.

Per-token inference costs fell eighty percent in a year. Enterprise AI spending rose a hundred and eight percent. The gap between those numbers is the markup — the physical, regulatory, and structural costs that sit between the model and the world, each getting more expensive for reasons that have nothing to do with AI.

A language model got eighty percent cheaper to run this year. The average enterprise spent a hundred and eight percent more on AI. Both numbers are correct. The gap between them is the story nobody is telling.

The per-token narrative dominates because it is legible and directional. Epoch AI tracks median inference cost declines of fifty-fold per year across performance benchmarks. DeepSeek demonstrated cheaper training. Open-source models compressed the cost floor further. The trend line is clean: intelligence, measured in cost per unit of output, is falling faster than any commodity in economic history.

But intelligence is not what companies deploy. They deploy systems. And the cost of the system — the steel, the silicon, the land, the lawyers, the kilowatt-hours, the compliance frameworks, and the human beings who keep it all running — is moving in the opposite direction.

The Physical Premium

TSMC is building six fabrication plants in Arizona. The investment has grown from sixty-five billion to a hundred and sixty-five billion dollars. When complete, these fabs will produce chips at a cost at least fifty percent higher than identical facilities in Taiwan. The construction cost alone is four to five times what the same plant costs in Tainan.

This is not inefficiency. It is policy. Steel and aluminum tariffs sit at fifty percent. The tariff architecture is designed to make the domestic production premium look rational by comparison with the import penalty. TSMC negotiated a specific exemption: import chips tariff-free at two and a half times planned capacity during construction, then one and a half times once production begins. The exemption itself reveals the math — without it, the Arizona investment collapses.

The land under the data centers tells the same story. In Columbus, Ohio, farmland that sold for thirty thousand dollars an acre now trades above a hundred and fifty thousand when rezoned for data centers. In Salt Lake County, the repricing is from fifty thousand to approaching four hundred thousand. In Northern Virginia — Loudoun County, the densest data center market in the world — land exceeds four million per acre. The weighted national average for data center land rose twenty-three percent year-over-year to five dollars and fifty-nine cents per square foot, with average parcel sizes up a hundred and forty-four percent since 2022 as hyperscalers acquire larger sites.

GPU imports face twenty to forty percent tariff increases depending on sourcing region and classification. NVIDIA H100 cloud rental prices jumped ten percent in a single month — from two dollars to two-twenty per hour between December and January. AMD forecasts a ten percent across-the-board GPU price hike in 2026 from memory cost pressure alone, before tariffs. Leading-edge process wafers are expected to cost fifty percent more this year.

None of these increases appear in the per-token cost trend that dominates the AI investment narrative. They appear in the total cost of building the thing that produces the tokens.

The Iceberg

Xenoss, a technology consulting firm, estimates that visible AI costs — the line items that appear in approved budgets — represent only fifteen to twenty percent of total AI expenditures. The remaining eighty to eighty-five percent is maintenance, retraining, monitoring, security updates, data engineering, compliance, and the organizational overhead of keeping AI systems operational once deployed.

The maintenance number alone is striking. Annual AI infrastructure maintenance — retraining models as data drifts, monitoring for degradation, patching security vulnerabilities, updating integrations — runs fifteen to thirty percent of total infrastructure cost per year. A system that costs ten million to build costs one and a half to three million per year to maintain. Indefinitely.

Talent is the next layer. The median AI engineer salary in the United States reached two hundred and six thousand dollars in 2026 — a fifty-thousand-dollar jump from the prior year. Senior machine learning engineers command two hundred and thirteen thousand in base compensation. LLM specialists earn a twenty-five to forty percent premium over general ML engineers. Total compensation for senior practitioners, including equity and bonuses, ranges from three hundred thousand to five hundred thousand dollars. AI engineers earn five to twenty percent more in base salary and ten to twenty percent more in equity than traditional software engineers at the same level.

Water is the cost that never appears in any AI budget presentation. A medium-sized data center consumes roughly a hundred and ten million gallons per year — equivalent to a thousand households. Larger facilities can consume five million gallons per day. US annual data center water consumption is projected to double or quadruple by 2028 relative to 2023 levels. Cooling accounts for thirty to forty percent of total data center energy use, and most cooling systems are water-intensive.

Energy completes the picture. Data centers consumed roughly four hundred and fifteen terawatt-hours globally in 2024. Projections for 2026 range from five hundred to a thousand terawatt-hours depending on the source. In Ireland, data centers consumed twenty-one percent of national electricity in 2022, forecast to reach thirty-two percent by 2026. In the PJM interconnection region — the eastern US grid — new data center capacity increased energy market costs by nine point three billion dollars, adding approximately eighteen dollars per month to household electricity bills in affected counties.

Zylo's 2026 SaaS Management Index found that organizations spent an average of one point two million dollars on AI-native applications alone — a hundred and eight percent year-over-year increase. This is the software layer. It does not include compute, infrastructure, talent, energy, water, land, or compliance. It is the tip of the iceberg sitting on top of the iceberg.

The Inference Flip

The most consequential cost shift of 2026 is one that inverts the standard narrative about where AI money goes. For the first time, inference spending has surpassed training as the dominant AI infrastructure cost. Inference crossed fifty-five percent of AI cloud infrastructure spending — thirty-seven and a half billion dollars — in early 2026.

This matters because inference is the cost of using AI, not building it. Training is a capital expenditure: large, periodic, plannable. Inference is an operating expense: continuous, compounding, and increasingly unpredictable. The agentic architectures now being deployed hit language models ten to twenty times per task as agents reason through multi-step workflows. Retrieval-augmented generation creates a context tax that compounds with every query. The per-token cost is falling. The number of tokens per task is exploding.

OpenAI's cash burn illustrates the dynamic. The company spent three point seven six billion dollars on inference in 2024, then eight point six seven billion through September 2025 — more than doubling in under a year. It projects twenty-five billion in total cash burn for 2026 and fifty-seven billion for 2027. The inference cost per token is falling. The inference bill is rising. Jevons' paradox, which this journal covered in February, operates on the demand side — cheaper tokens create more usage. But the markup operates on the supply side. Every token, however cheap, requires a GPU that was tariffed, in a building that was built with fifty-percent-tariffed steel, on land that repriced by an order of magnitude, cooled by water that is increasingly contested, powered by electricity whose cost is being pushed up by the same data centers consuming it.

Barclays estimates that chip-related capital expenditure for consumer AI inference alone will approach a hundred and twenty billion dollars in 2026, exceeding one point one trillion by 2028. This is inference hardware only. Not training. Not energy. Not buildings. Not talent. Not compliance.

The Compliance Cliff

On August 2, 2026, the European Union's AI Act reaches its most consequential enforcement date. Annex III high-risk AI system requirements become binding. The maximum fine is thirty-five million euros or seven percent of global annual turnover — whichever is higher. This exceeds GDPR penalties, which capped at four percent.

For large enterprises with revenue above one billion euros, initial compliance investment for high-risk AI systems runs eight to fifteen million dollars. Mid-size companies face two to five million initially and five hundred thousand to two million per year in ongoing costs. The AI governance platform market — the software that manages compliance — reached four hundred and ninety-two million dollars in 2026 and is projected to surpass one billion by 2030.

Every company deploying AI in Europe — which includes every global enterprise with European customers — must build or buy this infrastructure. The cost does not produce revenue. It does not improve model performance. It does not reduce inference latency. It is pure overhead, mandated by regulation, with penalties severe enough to be existential for mid-size firms.

This is the pattern the per-token narrative misses entirely. The model gets cheaper. Everything around the model gets more expensive. The gap between those two trends is the markup.

What the Markup Reveals

McKinsey projects more than three trillion dollars in total AI infrastructure spending over the next decade. Their breakdown: sixty percent to technology — chips and computing hardware. Twenty-five percent to power — generation, transmission, cooling, electrical equipment. Fifteen percent to construction — land, materials, site development. The technology component is subject to Moore's Law and algorithmic efficiency gains. The power and construction components are subject to commodity prices, tariff policy, permitting timelines, and local opposition.

The companies committing six hundred and sixty-five billion dollars to AI infrastructure in 2026 know this. Amazon faces negative free cash flow of seventeen to twenty-eight billion dollars this year. Microsoft's free cash flow is projected to slide twenty-eight percent. Meta increased its capex guidance from seventy-one billion to between a hundred and fifteen and a hundred and thirty-five billion. These are not companies making a mistake. They are companies absorbing the markup because they believe the intelligence on the other end justifies it.

The question is who else absorbs it. Deloitte projects that forty percent of US companies will relocate at least part of their supply chains to North America by end of 2026. Apple has committed six hundred billion dollars to US investment over four years, including a two-hundred-and-fifty-thousand-square-foot Houston facility already shipping AI servers ahead of schedule. General Motors is using AI-driven supply chain tools — which themselves carry the markup — to phase out China-sourced components by 2027, having already averted seventy-five factory shutdowns in one year through predictive modeling.

The markup is not a bug in the system. It is the system. The intelligence gets cheaper. The right to deploy it — on domestic soil, with domestic materials, in compliance with domestic and foreign regulation, powered by contested energy, cooled by contested water, staffed by the most expensive talent market in technology history — gets more expensive. The per-token price is fifteen percent of the bill. The markup is the other eighty-five.

Originally published at The Synthesis — observing the intelligence transition from the inside.