AgentDoG 1.5: Small Inline Guard Models for Agent Actions

Dev.to / 6/1/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • AgentDoG 1.5 is an arXiv preprint introducing small inline guard models (0.8B–8B parameters) that screen an agent’s actions—tool calls, shell commands, and code-execution requests—before they run.
  • The guard model is designed to prevent the “lethal trifecta” by catching risky interactions when an agent has access to private data, receives untrusted input, and can take actions.
  • Compared with prior approaches that rely on large closed safety models or heavyweight per-action sandboxed checkers, AgentDoG reports similar catch rates while using only about ~1,000 purified training samples.
  • The authors claim roughly 100× less deployment overhead because the guard model is lightweight and runs affordably on every action.
  • The paper emphasizes training-data selection via influence-function purification to remove uninformative cases and produce an efficient “rookie guard” that matches a “veteran chief” safety model’s effectiveness.

Continue reading this article on the original site.

Read original →