This startup is betting tokenmaxxing will create the next compute giant

TechCrunch / 4/15/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIndustry & Market Moves

Key Points

  • Parasail, a cloud provider for running generative AI models at inference time, says it processes about 500 billion tokens per day and is leaning into “tokenmaxxing” demand.
  • After coming out of stealth, the company raised a $32M Series A to scale a global on-demand GPU fleet across 40 data centers in 15 countries, aiming to keep inference fast and cheap.
  • Parasail does not fully commit to owning chips; instead it rents processing time from a mix of its own GPUs and market-sourced capacity, orchestrating allocation to avoid demand peaks and reduce costs.
  • The article argues Parasail’s growth depends on the spread of open-source models and agentic systems beyond frontier labs, driven by perceived cost and friction from major closed-model providers.
  • A broader hybrid inference ecosystem is emerging, highlighted via Elicit’s work serving pharmaceutical customers that analyze tens of thousands of papers using LLM-based research workflows.

“Give me tokens. Just give me tokens. I want them fast. I want them cheap. I want them now.”

That’s the mantra for developers building software on generative AI models, or at least what Parasail CEO Mike Henry hears. Parasail provides a cloud computing service to companies running AI models for inference, and Henry told TechCrunch it generates 500 billion tokens a day. How’s that for tokenmaxxing?

Henry was an executive at Groq, the LLM-focused chipmaker, where he built the company’s cloud offering, an early recognition that developers building software on AI models would want cloud processing specialized to their needs. Now, after coming out of stealth a year ago, Parasail has raised a $32 million Series A to do that at scale.

Henry has a background in physical chip design, but Parasail isn’t committed to owning its own chips. While some of its GPUs are its own, the company mainly rents processing time at 40 data centers in 15 countries around the globe, and buys more from liquidity markets, orchestrating that all behind the scenes to drive down the cost of inference requests.

By allocating workloads cleverly and avoiding demand peaks, the company aims to compete with firms that own their own silicon and might be constrained by existing customer commitments and workloads.

The company’s potential relies on the continued proliferation of open-source models and agents outside of frontier labs. Parasail’s executives and investors say this is driven by the growing cost and friction of using offerings from companies like Anthropic and OpenAI.

Instead, a hybrid architecture is emerging, according to Andreas Stuhlmüller, the CEO of Elicit, a startup that has raised a $22 million Series A to develop a research assistant for scientific literature. His customers at top pharmaceutical companies use the LLM-based tool to review and analyze data from tens of thousands of scientific papers.

Techcrunch event

Meet your next investor or portfolio startup at Disrupt


Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $410.

Meet your next investor or portfolio startup at Disrupt


Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $410.

San Francisco, CA | October 13-15, 2026

“We’ve moved more towards open models because it’s pretty rough sending 100,000s of requests to an API endpoint,” Stuhlmüller told TechCrunch, especially now that the company is relying on agents to improve its offering, splitting up tasks and working more strategically over longer time horizons. Open models handle the initial screening to drive down the cost of the work, before a more capable frontier model provides a final answer.

The proliferation of model queries, as agents become an increasingly common part of software development, is driving the investment in companies like Parasail that provide the infrastructure for cheap inference. Samir Kumar, a partner at Touring Capital who co-led this round, told TechCrunch he expects inference to be at least 20% of the cost of building software in the future.

How much of that market could be Parasail’s? In the crowded cloud compute space, Henry argues that his firm’s focus on inference (no training allowed) and willingness to take on startup customers without long-term commitments sets his offering apart from larger cloud-computing companies focused on enterprise business, and even better-funded competitors in the cloud inference space, like Fireworks AI and Baseten.

Of course, there’s a different kind of risk when all of your customers are seed and Series B startups in the unpredictable AI sector.

Steve Jang, a partner at Kindred Ventures, the other co-leader in this fundraising, says the economics of deploying models will demand the kind of compute brokerage Parasail provides. And that’s before widespread use of models for content generation and robotics.

“Everyone thought there was an AI bubble. There’s no AI bubble,” he told TechCrunch. “Inference demand is far outstripping supply.”