OpenAI moves inference in-house with custom Jalapeño chip
OpenAI's inference has run on NVIDIA GPUs, making cost cuts dependent on buying more hardware — a constraint every major lab was working around.
OpenAI and Broadcom launched 'Jalapeño,' a custom LLM inference ASIC, moving OpenAI's serving stack beyond pure GPU dependence (OpenAI blog).
No API price cut yet, but structural pressure is real. Expect gradual reductions over 6–12 months; at personal-scale usage, it's still mostly noise.