Gemini API Cheatsheet 2026 — Free Tier Limits, Models, and Endpoints in One Place

Dev.to / 5/3/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The article provides a 2026 “Gemini API Cheatsheet” consolidating commonly needed details for building with Google’s Gemini models, including model names and their recommended use cases.
  • It lists free-tier limits on Google AI Studio by model, showing RPM, TPM, and RPD so developers can estimate request and token throughput constraints.
  • It includes a basic REST request example using curl, demonstrating how to call the Gemini generateContent endpoint with an API key and a JSON prompt payload.
  • It further shows how to add a system instruction (system prompt) in the request body to steer the assistant’s behavior.
  • Overall, the cheat sheet is designed to reduce the time spent searching for model/endpoint and quota details while integrating Gemini into applications.

If this is useful, a ❤️ helps others find it.

Everything I keep looking up when building with Gemini — in one place.

Models (2026)

Model Context Best for
gemini-2.5-flash-preview 1M tokens General use, thinking, fast
gemini-2.5-pro-preview 1M tokens Complex reasoning, best quality
gemini-1.5-flash 1M tokens Stable, production-ready
gemini-1.5-pro 2M tokens Longest context
gemini-2.0-flash-lite 1M tokens Lowest latency, highest volume

For most use cases: gemini-2.5-flash-preview

Free Tier Limits (Google AI Studio)

Model RPM TPM RPD
Gemini 2.5 Flash Preview 10 250,000 500
Gemini 1.5 Flash 15 1,000,000 1,500
Gemini 1.5 Pro 2 32,000 50
Gemini 2.0 Flash Lite 30 1,000,000 1,500

RPM = requests per minute, TPM = tokens per minute, RPD = requests per day

Basic Request (REST)

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview:generateContent \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: YOUR_API_KEY" \
  -d '{
    "contents": [{"parts": [{"text": "Your prompt here"}]}]
  }'

With System Prompt

{
  "system_instruction": {
    "parts": [{"text": "You are a helpful assistant."}]
  },
  "contents": [
    {"role": "user", "parts": [{"text": "Your prompt here"}]}
  ]
}

Streaming

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview:streamGenerateContent \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: YOUR_API_KEY" \
  -d '{"contents": [{"parts": [{"text": "Tell me a story"}]}]}'

In Rust (reqwest)

use reqwest::Client;
use serde_json::json;

pub async fn call_gemini(prompt: &str, api_key: &str) -> Result {
    let client = Client::new();
    let url = format!(
        "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview:generateContent?key={}",
        api_key
    );

    let body = json!({
        "contents": [{"parts": [{"text": prompt}]}]
    });

    let res = client.post(&url).json(&body).send().await?;
    let data: serde_json::Value = res.json().await?;

    let text = data["candidates"][0]["content"]["parts"][0]["text"]
        .as_str()
        .unwrap_or("")
        .to_string();

    Ok(text)
}

Error Codes

Code Meaning Fix
400 Bad request / token limit Shorten prompt
403 Invalid API key Check key
429 Rate limit hit Wait and retry
500 Internal error Retry
503 Overloaded Wait 2s, retry once

Token Counting (rough guide)

  • 1 token ≈ 4 characters in English
  • 1 token ≈ 2–3 characters in Japanese
  • 100 lines of logcat ≈ 3,000–5,000 tokens
  • 1 page of PDF text ≈ 500–800 tokens

Get a Free API Key

  1. Go to aistudio.google.com
  2. Sign in with Google
  3. Click "Get API Key"
  4. Done — no credit card required

Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok