BoxAgnts Introduction (2) — AI Agent Toolbox

Dev.to / 5/26/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • BoxAgntsのミドルレイヤー「Agent Toolbox」は、ユーザーの意図を理解し、適切なツールを選んで実行し、その結果をフィードバックするための6つの中核モジュールで構成されます。
  • 具体的なアーキテクチャとして、boxagnts-api(マルチモデルの統一抽象化)→boxagnts-query(マルチターンのエージェント照会ループと自動リカバリ)→boxagnts-tools(ツール実行、WASMツール含む)という連携が示されます。
  • boxagnts-gatewayはCronスケジューラやサイトホスティングによるゲートウェイ/スケジューリングを担い、boxagnts-workspaceはSQLite・JSON設定・会話履歴によるメモリと構成管理を扱います。
  • モデルプロバイダ差異の問題に対し、LlmProviderトレイトにより20+のプロバイダを共通インターフェースで扱い、メッセージ正規化を行う方針が説明されています。
  • この記事は、ユーザーがダッシュボードでRustプロジェクト解析を依頼した場合に、内部でどのように意図理解からツールディスパッチ、実行結果の返却までが進むかを設計面から掘り下げています。

BoxAgnts' middle layer — the Agent Toolbox — is the brain and hands of the system. It consists of six core modules responsible for three things: understanding your intent, dispatching the right tools, and feeding back execution results. This article takes a deep dive into the architectural design and key implementations of each module.

Architecture Overview: A Seven-Module Collaboration Network

What happens when you type "Help me analyze the code structure of this Rust project" in the Dashboard and hit send?

User Message
  │
  ▼
┌─────────────────────────────────────────────────────────────┐
│  boxagnts-api            Unified API Abstraction Layer      │
│  LlmProvider trait → 20+ Providers → Message Normalization  │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-query          Agent Query Loop                   │
│  run_query_loop() → Multi-turn Conversation → Tool Dispatch → Auto Recovery │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-tools + tools-manager + wasm-tools                │
│  Tool trait → Built-in Tools + WASM Tools → Execution       │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-gateway        Gateway & Scheduling               │
│  Cron Scheduler + Site Hosting                              │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-workspace      Memory & Configuration             │
│  SQLite + JSON Config + Conversation History                │
└─────────────────────────────────────────────────────────────┘

Let's break down each one.

boxagnts-api: Unified Multi-Model Abstraction Layer

This is the interface layer between the middle layer and the external AI world. It solves the most painful problem in AI tool development: every model provider's API is different, but your code should not pay the price for that.

LlmProvider Trait: The Foundation of Polymorphism

The core interface that all provider adapters must implement:

#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn id(&self) -> &ProviderId;       // Unique identifier "anthropic", "openai"
    fn name(&self) -> &str;            // Human-readable name

    // Non-streaming request
    async fn create_message(&self, request: ProviderRequest)
        -> Result<ProviderResponse, ProviderError>;

    // Streaming request (returns Pin<Box<dyn Stream>>)
    async fn create_message_stream(&self, request: ProviderRequest)
        -> Result<Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>, ProviderError>;

    // List available models
    async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>;
}

This trait design has three elegant aspects:

  1. Async trait: Uses the async_trait macro, compatible with the Tokio async runtime
  2. Returns Pin>: Uses dynamic dispatch to abstract away different providers' stream type differences
  3. Unified error typing: All provider errors are normalized to ProviderError

Unified Access for 20+ Providers

BoxAgnts supports an extremely wide range of model providers:

Category Providers Independent Implementation File
International Mainstream OpenAI, Anthropic, Google, Azure, Bedrock Individual files
Open-Source Compatible Deepseek, Mistral, Groq, TogetherAI, Fireworks openai_compat.rs
Enterprise Services Copilot, CodeX, Cohere, Perplexity Individual files
Domestic Platforms MiniMax, Alibaba Cloud (Qwen), Zhipu, Moonshot, SiliconFlow Individual files
Others Venus, Nebius, Novita, OVHCloud Individual files

Key design pattern — Provider + Transformer dual-layer architecture:

Raw User Message
    │
    ▼
┌────────────────┐
│  Transformer   │  ← Converts internal message format to provider-specific format
│  (per-provider)│
└───────┬────────┘
        ▼
┌────────────────┐
│   Provider     │  ← Handles authentication, HTTP requests, stream parsing
│  (per-provider)│
└───────┬────────┘
        ▼
    AI Response
        │
        ▼
┌────────────────┐
│  Transformer   │  ← Converts provider response back to internal unified format
└────────────────┘

ProviderRegistry: Runtime Model Switching

QueryConfig contains a provider_registry field that allows dynamic provider selection at runtime. This means you can:

  • Configure different models for different tasks in Agent config (cheap model for summarization, strong model for reasoning)
  • Use fallback_model to automatically switch to a backup model when the primary model is overloaded
  • Manage API keys and endpoints for multiple models via ModelRegistry

API Key Management: Balancing Security and Convenience

BoxAgnts predefines environment variable mappings for each provider:

pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] {
    match provider_id {
        "anthropic" => &["ANTHROPIC_API_KEY"],
        "openai"    => &["OPENAI_API_KEY"],
        "deepseek"  => &["DEEPSEEK_API_KEY"],
        "zhipu"     => &["ZHIPU_API_KEY"],
        "minimax"   => &["MINIMAX_API_KEY"],
        // ... 30+ providers
    }
}

This means you can inject API keys through three methods — environment variables, configuration files, or the Dashboard UI — maximizing flexibility while maintaining security boundaries.

boxagnts-query: The Core Engine of the Agent

This layer is the absolute soul of BoxAgnts. The run_query_loop() function implements the complete Agent reasoning loop, about 300 lines of code, yet handles an amazing number of edge cases.

Main Loop Skeleton

loop {
    turn += 1;

    // 0. Check cancellation signal
    if cancel_token.is_cancelled() { return Cancelled; }

    // 1. Check max turns limit
    if turn > effective_max_turns { return EndTurn; }

    // 2. Inject pending user messages (multimodal interaction)
    if let Some(queue) = pending_messages.as_deref_mut() {
        for text in queue.drain(..) { /* append as user message */ }
    }

    // 3. Auto context compaction
    compact_state.maybe_compact(messages, config);

    // 4. Build API request
    let request = build_request(messages, tools, config);

    // 5. Send to AI model (supports streaming)
    let response = client.create_message_stream(request).await;

    // 6. Parse ContentBlocks from response
    for block in response.content {
        match block {
            ContentBlock::Text { text } => { /* accumulate text response */ }
            ContentBlock::ToolUse { name, input, .. } => {
                // Match and execute tool
                let tool = find_tool(name);
                let result = tool.execute(input, tool_ctx).await;
                messages.push(tool_result);  // Inject result into conversation
            }
            ContentBlock::Thinking { thinking, .. } => {
                // Handle deep thinking content (not shown to user)
            }
        }
    }

    // 7. If model ends → return final message
    if stop_reason == "end_turn" { return EndTurn; }
}

Key Mechanism Analysis

Token Exhaustion Recovery

When the model runs out of token quota in a single response, the query loop does not simply return a truncated result. Instead, it automatically sends a carefully designed recovery message:

"Output token limit hit. Resume directly — no apology, no recap of what
 you were doing. Pick up mid-thought if that is where the cut happened.
 Break remaining work into smaller pieces."

This message is remarkably restrained in design: "no apology, no recap, pick up from the cut, break down tasks" — conveying maximum instruction with minimum tokens. Retries up to 3 times (MAX_TOKENS_RECOVERY_LIMIT = 3) to avoid infinite loops.

Auto Context Compaction

compact.rs implements an intelligent compression strategy. When conversation history approaches the model's context window limit, it summarizes early messages — preserving key information (file paths, error messages, important decisions) while discarding redundant intermediate steps. This strategy ensures that even extremely complex multi-turn tasks (such as refactoring an entire codebase) won't cause the Agent to "lose its memory" due to context overflow.

Fallback Model Mechanism

// query.rs — Auto switch to backup model on overload errors
if is_overloaded_error(&err) && fallback_model.is_some() && !used_fallback {
    effective_model = fallback_model;
    used_fallback = true;
    continue; // Retry with backup model
}

When the primary model (e.g., Claude Sonnet) returns an overload error during high-load periods, the system automatically switches to a backup model (e.g., Deepseek), ensuring tasks are not interrupted. This mechanism is completely transparent to the user.

Budget Control

pub enum QueryOutcome {
    BudgetExceeded { cost_usd: f64, limit_usd: f64 },
    // ...
}

After each turn, the query loop checks whether the accumulated cost exceeds the budget cap. Every API call is tracked via CostTracker recording model and token consumption, ensuring costs are controllable. Budget overruns return clear error messages rather than silently overspending.

Multimodal Content Blocks

The ContentBlock enum defines 14 content types, covering the full spectrum of interactions from plain text to deep thinking:

pub enum ContentBlock {
    Text { text: String },                          // Plain text
    Image { source: ImageSource },                  // Image
    ToolUse { id, name, input },                    // Tool call
    ToolResult { tool_use_id, content, is_error },  // Tool result
    Thinking { thinking, signature },               // Deep thinking
    Document { source, title, context },            // Document reference
    UserLocalCommandOutput { command, output },     // Shell command output
    UserCommand { name, args },                     // User command
    UserMemoryInput { key, value },                 // User memory
    SystemAPIError { message, retry_secs },         // API error
    CollapsedReadSearch { tool_name, paths },       // Collapsed search results
    TaskAssignment { id, subject, description },    // Sub-task assignment
    // ...
}

This fine-grained content typing allows the frontend to render each type with specialized treatment — error blocks show red borders, task assignment blocks show cyan borders, collapsed search results displayed as single-line summaries.

Managed Agent Mode (Manager-Executor)

This is one of the most stunning middle-layer designs in BoxAgnts. managed_orchestrator.rs implements a hierarchical Agent architecture:

                    User
                      │
                      ▼
         ┌───────────────────────┐
         │  Manager Agent        │  ← Uses strong model (e.g., Claude Opus)
         │  Analyze tasks → Break down → Assign │
         └───────┬───────────────┘
                 │
        ┌────────┼────────┐
        ▼        ▼        ▼
   ┌────────┐┌────────┐┌────────┐
   │Executor││Executor││Executor│  ← Uses economical model (e.g., Claude Sonnet/Deepseek)
   │Subtask1││Subtask2││Subtask3│
   └────┬───┘└────┬───┘└────┬───┘
        │         │         │
        └────────┼─────────┘
                 ▼
          Manager aggregates results
                 │
                 ▼
              Final Output

Key Configuration

pub struct ManagedAgentConfig {
    pub enabled: bool,
    pub manager_model: String,           // Manager model (e.g., "claude-opus-4-6")
    pub executor_model: String,          // Executor model (e.g., "claude-sonnet-4-6")
    pub executor_max_turns: u32,         // Max turns per executor
    pub max_concurrent_executors: u32,   // Max parallel executors
    pub total_budget_usd: Option<f64>,   // Total budget cap
    pub executor_isolation: bool,        // Whether to isolate Git worktrees
}

System Prompt Injection

The Manager Agent's system prompt precisely defines its role:

You are the MANAGER, the planning and reasoning layer.
You coordinate work but do NOT execute tasks using file/bash tools directly.
All implementation work is delegated to executor agents (via the Agent tool).
Each executor uses {executor_model}, with a maximum of {max_turns} turns.
You may run up to {max_concurrent} executors in parallel.

The Executor's prompt requires "complete self-containment" — executors cannot see the Manager's conversation history and must include all context in their prompt. This avoids context leakage and reduces token consumption.

boxagnts-tools + tools-manager: Unified Tool Abstraction

Tool Trait: The Cornerstone of the Architecture

This is the most critical interface definition in all of BoxAgnts. Every new tool only needs to implement this trait:

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &'static str;
    fn description(&self) -> &'static str;
    fn input_schema(&self) -> Value;    // JSON Schema defining parameters
    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}

ToolContext: The Tool's Execution Environment

pub struct ToolContext {
    pub cost_tracker: Arc<CostTracker>,         // Cost tracker
    pub session_id: Option<String>,             // Session ID
    pub current_turn: Arc<AtomicUsize>,         // Current turn
    pub non_interactive: bool,                  // Non-interactive mode
    pub config: Config,                         // Global configuration
    pub managed_agent_config: Option<ManagedAgentConfig>,
    pub allowed_outbound_hosts: Vec<String>,    // Outbound network whitelist
    pub block_url: Option<String>,              // Blocked URLs
}

ToolContext is the tool's "passport" — carrying various contextual information such as permissions, sessions, costs, and networking. Every tool can access the required system state through it during execution.

Central Tool Registry

// tools-manager/src/lib.rs
pub fn all_tools() -> Vec<Box<dyn Tool>> {
    vec![
        // Rust native tools
        Box::new(AskUserQuestionTool),
        Box::new(BriefTool),
        Box::new(EnterPlanModeTool),
        Box::new(ExitPlanModeTool),
        Box::new(SleepTool),
        Box::new(SkillTool),
        Box::new(ToolSearchTool),

        // WASM sandbox tools — same interface, different implementation
        Box::new(WasmTool::new("read",  "file-read-component.wasm",  ...)),
        Box::new(WasmTool::new("write", "file-write-component.wasm", ...)),
        Box::new(WasmTool::new("edit",  "file-edit-component.wasm",  ...)),
        Box::new(WasmTool::new("glob",  "file-glob-component.wasm",  ...)),
        Box::new(WasmTool::new("bash",  "bash-component.wasm",       ...)),
        Box::new(WasmTool::new("web_fetch", "web-fetch-component.wasm", ...)),
        Box::new(WasmTool::new("js_exec", "boxedjs-execute-component.wasm", ...)),
    ]
}

Notice that Rust native tools and WASM tools are placed in the same Vec<Box<dyn Tool>> — to the AI model, they are completely equivalent. This is the power of interface-oriented programming.

boxagnts-gateway: Extending Time and Space Dimensions

Cron Scheduled Task Engine

cron/scheduler.rs builds a complete scheduled task system based on tokio_cron_scheduler:

// Core scheduling logic
let cron_job = Job::new_async(cron_expr, move |_uuid, _lock| {
    Box::pin(async move {
        let handle = job::execute(prompt, model).await;
        // Execution with timeout + result logging
        let result = timeout(Duration::from_secs(timeout), fut).await;
        append_execution_log(job_id, job_name, success, message).await;
    })
});

Key features:

  • Timeout protection: Each task has an independent timeout setting (default 180 seconds), wrapped by tokio::time::timeout
  • Cancel propagation: On timeout, cancels the executing Agent query via CancellationToken
  • Execution logs: Each execution records time, success/failure status, and result summary
  • Dynamic management: Tasks can be added, removed, enabled/disabled at any time

Site Hosting System

Site data managed by site/store.rs is persisted via SQLite, supporting CRUD operations. Combined with the frontend SitesPage, users can:

  1. Create sites in the Dashboard (enter name and path)
  2. Let the AI Agent generate web content
  3. Access via the /sites/{name}/ path

boxagnts-workspace: The Agent's Memory System

The workspace module handles all persistence and configuration management:

Function Storage Key Implementation
Conversation History SQLite (rusqlite) Organized by session, supports CRUD
User Authentication Password hash storage Verified for remote access
Global Configuration JSON file Settings::load() to load
API Keys Environment variables / JSON Three-tier priority: ENV > Config > Default
AGENTS.md Filesystem Injected into system prompt each conversation
Cron Tasks SQLite Persisted storage + loaded at startup
Site Config SQLite Persisted storage + loaded at startup

Design highlight: configuration and state are separated. Configuration is JSON files (human-readable and editable), state is SQLite (efficient queries and transactions). This distinction avoids the common pitfall of "configuration file bloat."

QueryConfig: Full-Dimensional Query Control

QueryConfig is a massive configuration struct with 20 fields, covering every dimension of an Agent query:

pub struct QueryConfig {
    pub model: String,                           // Model name
    pub max_tokens: u32,                         // Max output tokens
    pub max_turns: u32,                          // Max reasoning turns
    pub system_prompt: Option<String>,           // System prompt
    pub thinking_budget: Option<u32>,            // Thinking budget (deep reasoning)
    pub temperature: Option<f32>,                // Temperature parameter
    pub tool_result_budget: usize,               // Total char cap for tool results (50000)
    pub effort_level: Option<EffortLevel>,       // Effort level (affects thinking_budget)
    pub max_budget_usd: Option<f64>,             // USD budget cap
    pub fallback_model: Option<String>,          // Backup model
    pub agent_definition: Option<AgentDefinition>, // Agent definition
    pub managed_agents: Option<ManagedAgentConfig>, // Managed mode
    pub output_style: OutputStyle,               // Output style
    // ... and more
}

This struct demonstrates a core design philosophy of BoxAgnts: give control to the user, but provide reasonable defaults. Every field can be overridden, but none are required — defaults cover 90% of use cases.

Summary

The middle-layer Agent Toolbox is the capability core of BoxAgnts:

Module Responsibility Key Highlight
boxagnts-api Multi-model unified access LlmProvider trait, 20+ Providers, Transformer conversion
boxagnts-query Agent reasoning loop Token recovery, context compaction, Fallback switching, budget control
managed_orchestrator Managed Agent architecture Manager-Executor layering, parallel execution, budget management
boxagnts-tools Unified tool abstraction Tool trait, ToolContext
tools-manager Central tool registry Rust native + WASM unified as Vec>
boxagnts-gateway Time and space extension Cron scheduler, Site hosting
boxagnts-workspace Memory system SQLite + JSON dual-layer storage

Related Resources