Most AI agent tutorials show the happy path. Your agent calls an LLM, gets a response, does the thing. Ship it.
Then production happens. Rate limits. Timeouts. Malformed responses. Context window overflows. Your agent goes from "demo-ready" to "incident-generating" in about 48 hours.
I run a small operation — 5 agents max, solo founder. Every failure that wakes me up at 3am is one I should have handled in code. Here are the patterns that actually work.
Classify Your Errors First
Not all errors deserve the same treatment. The first thing I do in any agent system is classify failures into two buckets:
Transient errors: Rate limits (429), timeouts, temporary network blips, model overload. These will probably work if you try again.
Permanent errors: Invalid API keys, malformed prompts, context window exceeded, model doesn't exist. Retrying won't help.
class ErrorClassifier:
TRANSIENT_CODES = {429, 500, 502, 503, 504}
@staticmethod
def classify(error):
if hasattr(error, 'status_code'):
if error.status_code in ErrorClassifier.TRANSIENT_CODES:
return "transient"
if "timeout" in str(error).lower():
return "transient"
return "permanent"
This classification drives everything downstream. Transient errors get retries. Permanent errors get logged, reported, and gracefully degraded. When you're thinking about agent security patterns, error classification also matters — permanent auth errors need different alerting than transient network hiccups.
Retry Strategies That Don't Make Things Worse
The naive approach — retry immediately, retry forever — is how you turn a rate limit into a ban. Exponential backoff with jitter is the baseline:
import random
import time
def retry_with_backoff(fn, max_retries=3, base_delay=1.0):
for attempt in range(max_retries):
try:
return fn()
except Exception as e:
if ErrorClassifier.classify(e) == "permanent":
raise # Don't retry permanent errors
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.5)
time.sleep(delay + jitter)
Key details: jitter prevents thundering herd when multiple agents hit the same limit. And always cap your retries — 3 is usually enough. If it hasn't worked in 3 tries, it's not going to work in 30.
Circuit Breakers for LLM Calls
Retries handle individual failures. Circuit breakers handle systemic ones. If your LLM provider is having a bad day, you don't want every request queuing up and timing out.
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_time=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_time = recovery_time
self.last_failure_time = None
self.state = "closed" # closed = normal, open = blocking
def call(self, fn):
if self.state == "open":
if time.time() - self.last_failure_time > self.recovery_time:
self.state = "half-open"
else:
raise CircuitOpenError("Circuit breaker is open")
try:
result = fn()
if self.state == "half-open":
self.state = "closed"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "open"
raise
I wrap every external LLM call in a circuit breaker. When the circuit opens, agents fall back to cached responses or simpler logic instead of piling up failures. If you're taking an observability-first approach, you'll want to track circuit state transitions — they're one of the best early warning signals.
Fallback Chains: Your Safety Net
When your primary model fails, having a fallback chain prevents total outage:
FALLBACK_CHAIN = [
{"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
{"provider": "openai", "model": "gpt-4o-mini"},
{"provider": "local", "model": "cached_response"},
]
def call_with_fallback(prompt, chain=FALLBACK_CHAIN):
errors = []
for option in chain:
try:
return call_model(option["provider"], option["model"], prompt)
except Exception as e:
errors.append(f"{option['provider']}: {e}")
continue
raise AllProvidersFailedError(
f"All {len(chain)} providers failed: {'; '.join(errors)}"
)
The chain degrades gracefully: premium model → cheaper model → cached/static response. Your users get something even when everything is on fire.
Timeout Handling
LLM calls are slow. An agent waiting 120 seconds for a response that's never coming is wasting resources and blocking downstream work.
import asyncio
async def call_with_timeout(coro, timeout_seconds=30):
try:
return await asyncio.wait_for(coro, timeout=timeout_seconds)
except asyncio.TimeoutError:
raise TimeoutError(f"LLM call exceeded {timeout_seconds}s limit")
Set aggressive timeouts. For most agent tasks, if you haven't gotten a response in 30 seconds, something is wrong. I default to 30s for completions and 10s for embeddings.
Putting It All Together
Here's how these patterns compose in a real agent:
async def agent_execute(task):
breaker = get_circuit_breaker("llm_calls")
try:
result = breaker.call(
lambda: retry_with_backoff(
lambda: call_with_fallback(task.prompt),
max_retries=3
)
)
return AgentResult(status="success", data=result)
except CircuitOpenError:
return AgentResult(
status="degraded",
data=get_cached_response(task),
note="Using cached response - LLM circuit open"
)
except AllProvidersFailedError:
return AgentResult(
status="failed",
data=None,
note="All providers unavailable"
)
The key insight: every layer has a defined failure mode. Timeouts prevent hangs. Retries handle blips. Circuit breakers prevent cascading failures. Fallbacks provide degraded-but-functional responses.
What I Track
Error handling is only useful if you know it's working. For my small setup, I track:
- Error classification distribution — am I seeing more transient or permanent errors?
- Circuit breaker state changes — how often are circuits opening?
- Fallback chain depth — how far down the chain are requests going?
- Retry success rate — are retries actually recovering errors?
Having real-time error monitoring changed how I build agents. Instead of finding out about failures from users, I catch patterns before they become outages.
The Boring Truth
None of these patterns are novel. Circuit breakers come from distributed systems. Retry with backoff is older than most of us. Fallback chains are just failover by another name.
But applying them specifically to AI agents — where failures are probabilistic, responses are non-deterministic, and costs compound with every retry — that's where the craft is. Start with error classification, layer on retries, add circuit breakers, and build fallback chains. Your 3am self will thank you.


