AgentDoG 1.5: Small Inline Guard Models for Agent Actions

Dev.to / 6/1/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

AgentDoG 1.5 is an arXiv preprint introducing small inline guard models (0.8B–8B parameters) that screen an agent’s actions—tool calls, shell commands, and code-execution requests—before they run.
The guard model is designed to prevent the “lethal trifecta” by catching risky interactions when an agent has access to private data, receives untrusted input, and can take actions.
Compared with prior approaches that rely on large closed safety models or heavyweight per-action sandboxed checkers, AgentDoG reports similar catch rates while using only about ~1,000 purified training samples.
The authors claim roughly 100× less deployment overhead because the guard model is lightweight and runs affordably on every action.
The paper emphasizes training-data selection via influence-function purification to remove uninformative cases and produce an efficient “rookie guard” that matches a “veteran chief” safety model’s effectiveness.

Continue reading this article on the original site.

Dev.to

Dev.to

Dev.to

Reddit r/artificial

Dev.to