ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

arXiv cs.CL / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces ARGUS, a policy-adaptive advertising governance system designed for non-stationary regulatory environments where new mandates cause outdated labels and ambiguous reasoning in historical data.
ARGUS uses a three-stage pipeline—Policy Seeding, Adversarial Label Rectification (via a Prosecutor-Defender-Umpire architecture), and Latent Knowledge Discovery (tripartite dialectical discussion) to find both clear and “gray-area” violations.
To handle sparse new policy data, the system leverages RAG-enhanced policy knowledge and Chain-of-Thought-based reward signals to guide evolving reinforcement learning toward regulations that change over time.
Experiments on industrial and public datasets show ARGUS outperforms traditional fine-tuning baselines, achieving stronger policy-adaptive performance with minimal labeled “gold” data.
Overall, ARGUS frames ad governance as an evolving multi-agent, adversarially adjudicated reasoning problem rather than a static classifier trained once on fixed labels.

Abstract

Online advertising governance faces significant challenges due to the non-stationary nature of regulatory policies, where emerging mandates (e.g., restrictions on education or aesthetic anxiety) create severe label inconsistencies and reasoning ambiguities in historical datasets. In this paper, we propose ARGUS, a policy-adaptive governance system that enables evolving reinforcement through multi-agent adversarial umpiring. ARGUS addresses the sparsity of new policy data by employing a three-stage framework: (1) Policy Seeding for initial perception; (2) Adversarial Label Rectification, which utilizes a ``Prosecutor-Defender-Umpire'' architecture to resolve conflicts between stale labels and new mandates; and (3) Latent Knowledge Discovery, which employs a tripartite dialectical discussion to unearth sophisticated, ``gray-area'' violations. By leveraging RAG-enhanced policy knowledge and Chain-of-Thought synthesis as dynamic rewards for reinforcement learning, ARGUS synchronizes its reasoning pathways with evolving regulations. Extensive experiments on both industrial and public datasets demonstrate that ARGUS significantly outperforms traditional fine-tuning baselines, achieving superior policy-adaptive learning with minimal gold data.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/5DailyView insight →

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

Dev.to

ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

Key Points

Abstract

💡 Insights using this article

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Meta will use AI to analyze height and bone structure to identify if users are underage

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer