ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants
arXiv cs.AI / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The Argus framework improves LLM-based coding agents for GPU kernel generation by using compile-time “data-flow invariants” instead of relying on sparse pass/fail feedback.
- It provides a tile-based, Pythonic DSL with symbolic tag propagation and tag assertions that enforce relational constraints across kernel execution, producing actionable counterexamples when violations occur.
- Invariants are verified at compile time using abstract interpretation over a layout algebra and SMT solving, achieving zero runtime overhead while catching global constraint issues.
- An in-context reinforcement learning planner selects optimization strategies and synthesizes invariants, leveraging a curated GPU optimization knowledge base.
- On AMD MI300X, Argus-generated kernels reach 99–104% of hand-optimized assembly throughput for GEMM, flash attention, and MoE, and outperform existing agentic systems by 2–1543× while also performing strongly across KernelBench tasks.
Related Articles
I’m working on an AGI and human council system that could make the world better and keep checks and balances in place to prevent catastrophes. It could change the world. Really. Im trying to get ahead of the game before an AGI is developed by someone who only has their best interest in mind.
Reddit r/artificial
Deepseek V4 Flash and Non-Flash Out on HuggingFace
Reddit r/LocalLLaMA

DeepSeek V4 Flash & Pro Now out on API
Reddit r/LocalLLaMA

I’m building a post-SaaS app catalog on Base, and here’s what that actually means
Dev.to

From "Hello World" to "Hello Agents": The Developer Keynote That Rewired Software Engineering
Dev.to