Building AI Agents That Actually Work in Production: My Technical Approach

Dev.to / 3/26/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The article argues that AI agents that work in demos often fail in production due to real-world data messiness, concurrency, long-running tasks, cost constraints, and the need for observability.
It presents a core production architecture choice: store durable agent state in a database rather than in memory to survive restarts, support horizontal scaling, and enable auditing/debugging.
It outlines key production concerns the author designs around, including handling real input edge cases and managing shared state across multiple simultaneous agent executions.
It emphasizes that production agents must be traceable—teams need visibility into exactly what the agent decided and why to diagnose failures and improve reliability.

Building an AI agent that works in a demo is easy. Building one that works reliably in production is a completely different engineering challenge.

Production systems must handle real users, real data, and real consequences when things fail.

This is the production agent architecture I use across Brainfy AI and Navlyt, along with real code patterns and failure modes I design around.

What Makes Production Agents Different From Demo Agents

Demo agents optimize for the happy path.

Production agents must handle:

Real data variance
Production inputs are messy, ambiguous, and full of edge cases.
Concurrent executions
Multiple agent instances running simultaneously with shared state.
Long-running tasks
Agents that may take minutes or hours requiring durable execution state.
Cost management
Confused agents making unnecessary tool calls can become expensive quickly.
Observability
You must understand exactly what the agent decided and why.

The Core Architecture: Durable Agent State

The most important production decision:

Keep agent state in a database — not in memory.

In-memory state:

Dies with the server
Cannot scale horizontally
Cannot be audited

Database state:

Survives restarts
Enables horizontal scaling
Provides observability
Enables debugging

Example schema:

-- Agent execution state table

CREATE TABLE agent_executions (

 id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

 user_id UUID REFERENCES auth.users NOT NULL,

 agent_type TEXT NOT NULL,

 status TEXT NOT NULL DEFAULT 'pending',

 CONSTRAINT valid_status CHECK (
   status IN (
     'pending',
     'running',
     'completed',
     'failed',
     'cancelled',
     'awaiting_review'
   )
 ),

 input_data JSONB NOT NULL,

 state JSONB DEFAULT '{}',

 result JSONB,

 error TEXT,

 step_count INTEGER DEFAULT 0,

 token_count INTEGER DEFAULT 0,

 created_at TIMESTAMPTZ DEFAULT NOW(),

 updated_at TIMESTAMPTZ DEFAULT NOW(),

 completed_at TIMESTAMPTZ

);

-- Tool call log for observability

CREATE TABLE agent_tool_calls (

 id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

 execution_id UUID REFERENCES agent_executions NOT NULL,

 step_number INTEGER NOT NULL,

 tool_name TEXT NOT NULL,

 tool_input JSONB NOT NULL,

 tool_output JSONB,

 status TEXT NOT NULL DEFAULT 'pending',

 latency_ms INTEGER,

 error TEXT,

 called_at TIMESTAMPTZ DEFAULT NOW()

);

The Agent Loop With Production Safeguards

Production agents need hard limits.

Example safeguards:

Step limits
Token limits
Timeout limits
Failure conditions

Example TypeScript loop:

// lib/agents/production-agent.ts

const AGENT_LIMITS = {

 maxSteps: 25,

 maxTokens: 50_000,

 stepTimeoutMs: 30_000,

 totalTimeoutMs: 300_000

}

export async function runAgent(

 executionId: string,

 supabase: SupabaseClient

): Promise<void> {

 const startTime = Date.now()

 let execution = await loadExecution(
   executionId,
   supabase
 )

 await updateStatus(
   executionId,
   'running',
   supabase
 )

 while (true) {

   const elapsed =
     Date.now() - startTime

   if (execution.step_count >= AGENT_LIMITS.maxSteps){

     await failWithReason(
       executionId,
       'MAX_STEPS_EXCEEDED',
       supabase
     )

     return
   }

   if (execution.token_count >= AGENT_LIMITS.maxTokens){

     await failWithReason(
       executionId,
       'MAX_TOKENS_EXCEEDED',
       supabase
     )

     return
   }

   if (elapsed >= AGENT_LIMITS.totalTimeoutMs){

     await failWithReason(
       executionId,
       'TOTAL_TIMEOUT',
       supabase
     )

     return
   }

   const response =
     await callModel(messages, TOOLS)

   execution.step_count++

   execution.token_count +=
     response.usage?.total_tokens ?? 0

   await persistState(
     executionId,
     execution,
     supabase
   )

}

The Human-in-the-Loop Gate

For actions that are difficult to reverse, I require human approval.

The agent:

Prepares the action
Sets status to awaiting_review
Stops execution
Waits for approval

Example:

const APPROVAL_REQUIRED_TOOLS = [

 'send_email',

 'update_customer_record',

 'generate_compliance_document',

 'submit_to_regulator'

]

async function executeToolCall(

 toolCall,

 executionId,

 supabase

){

 if(APPROVAL_REQUIRED_TOOLS.includes(name)){

   await updateStatus(
     executionId,
     'awaiting_review',
     supabase
   )

   throw new AgentPausedError(
     'Human approval required'
   )
 }

 return await callTool(name,args)

}

Monitoring: What I Track in Production

Metrics I monitor:

Step efficiency
Tool success rate
Human review escalation rate
Token cost per completion
Completion rate

Example health query:

const { data } =
await supabase.rpc(

 'agent_health_metrics',

 {

   agent_type:
     'compliance_document_generator',

   since:
     new Date(
       Date.now() -
       7 * 24 * 60 * 60 * 1000
     ).toISOString()

 }
)

Typical results:

Completion rate: 94%
Avg steps: 8.3
Human review rate: 3.1%

Key Lessons

Production agents require:

Durable state
Hard execution limits
Observability
Cost controls
Human approval gates

Most failures come from missing safeguards, not model quality.

About the Author

Tilak Raj
Founder & CEO — Brainfy AI

Building vertical AI SaaS across compliance, real estate, agriculture, and aviation.

Website: https://www.tilakraj.info

Projects: https://www.tilakraj.info/projects

Questions about production agents? Drop a comment — I reply to all of them.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/26DailyView insight →

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

I made a new programming language to get better coding with less tokens.

Dev.to

RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore

Dev.to

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Dev.to

Building AI Agents That Actually Work in Production: My Technical Approach

Key Points

What Makes Production Agents Different From Demo Agents

The Core Architecture: Durable Agent State

The Agent Loop With Production Safeguards

The Human-in-the-Loop Gate

Monitoring: What I Track in Production

Key Lessons

About the Author

💡 Insights using this article

Related Articles

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

I made a new programming language to get better coding with less tokens.

RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore

Why I Switched From GPT-4 to Small Language Models for Two of My Products

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer