New ways to balance cost and reliability in the Gemini API

Google Blog / 4/3/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • Google announced two new Gemini API inference tiers—Flex and Priority—designed to let developers trade off cost against latency.
  • The Flex tier targets lower costs by allowing more flexibility in response timing, while the Priority tier emphasizes faster, more consistent responses.
  • These tier options aim to help teams choose the right performance level for different workloads, improving reliability and cost efficiency.
  • The update focuses on practical API usage patterns for balancing operational expenses with user experience in Gemini-powered applications.
Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.