OpenAI · Multi-Model Pro
GPT-5.6 Pro
quietly split into three.
OpenAI has always sold "one top-tier model" as its story. A newly public in-house paper reveals that GPT-5.6 Pro now ships as three parallel models tuned to different jobs. The single-flagship era is bending.
The Old Bet
One Pro,
and one only
GPT-4o, GPT-5, GPT-5.5 — for years OpenAI has held to "one top model, whatever the task." A cleaner story: multimodal, reasoning, everything under one banner. The developer never has to choose.
But that story comes with a bill. When everything runs under one roof, inference cost and latency get pulled by every workload at once. Ordinary lookups end up firing the same heavyweight machinery that answers deep reasoning questions, and the inefficiency shows up on the internal dashboards.
Three Pros
Three Pro variants,
side by side
The paper reveals a Pro tier that runs as three parallel models, not one.
The OpenAI paper reveals that GPT-5.6 Pro now runs as a three-model lineup — Sol, Nova, and Dev. A decisive turn away from the single-flagship playbook, though users rarely see the switch — an internal router picks among the three based on the request.
Sol handles research and experiments with deep reasoning; Nova is the everyday general-purpose model; Dev is engineer-facing, tuned for code and agent context. Splitting like this makes per-use-case optimization tractable — inference cost and latency can each be shaped to the task.
Aligning With Rivals
Anthropic's two tiers
get an OpenAI answer
Anthropic has been running its own tiered lineup for a while — heavy Opus, light Sonnet — sorted by intent. OpenAI now answers with three tiers instead of two. Google Gemini, meanwhile, still bets on one general-purpose flagship, so the top two are visibly walking away from Google's approach.
Splitting into three is more than mimicking Anthropic. It's also a declaration that the router behind ChatGPT is getting smarter. The way models get chosen is becoming a competitive axis of its own, invisible to the end user but very visible in the numbers.
Who Feels It
The API tier
notices most
Direct API developers
Explicit selection means real cost/quality design levers. Sol/Nova/Dev should map to distinct per-token prices.
Benchmark operators
Any single-model bench design starts to leak signal. Time to rebuild the comparison tables per use case.
ChatGPT casual users
Practical difference is close to zero. The router picks for you and the UI barely moves.
The Frontier
Is the single flagship
going out of style?
A single flagship is easy to sell: one brand, one story. But as model-to-model quality gaps close and inference cost gets impossible to ignore, splitting by use case ends up stronger overall. When Anthropic and OpenAI move the same direction, that's the industry converging on that view.
From a user-experience angle, "trust one model" is still simpler. The next fight is about the naturalness of running a tiered lineup underneath — building the routing so cleanly that the end user never has to think about which of the three is answering.