Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

arXiv cs.AI / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents an empirical study showing that general-purpose coding agents, without hardware-specific training, can optimize hardware designs from high-level algorithm specifications using HLS toolchains.
  • It introduces an “agent factory” two-stage pipeline: decomposing designs into sub-kernels and using an ILP to assemble promising global configurations, then launching multiple expert agents to explore cross-function optimizations like pragma recombination, loop fusion, and memory restructuring.
  • Experiments on 12 HLS kernels (from HLS-Eval and Rodinia-HLS) using Claude Code (Opus 4.5/4.6) with AMD Vitis HLS show strong scaling: increasing from 1 to 10 agents yields an average 8.27× speedup.
  • Harder benchmarks see especially large gains, with streamcluster exceeding 20× and kmeans reaching around 10×, and the best results sometimes emerge from non–top-ranked ILP candidates.
  • The authors conclude that scaling agent populations is a practical and effective lever for HLS optimization and that agents can rediscover known hardware optimization patterns without domain-specific training.

Abstract

We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~1, the pipeline decomposes a design into sub-kernels, independently optimizes each using pragma and code-level transformations, and formulates an Integer Linear Program (ILP) to assemble globally promising configurations under an area constraint. In Stage~2, it launches N expert agents over the top ILP solutions, each exploring cross-function optimizations such as pragma recombination, loop fusion, and memory restructuring that are not captured by sub-kernel decomposition. We evaluate the approach on 12 kernels from HLS-Eval and Rodinia-HLS using Claude Code (Opus~4.5/4.6) with AMD Vitis HLS. Scaling from 1 to 10 agents yields a mean 8.27\times speedup over baseline, with larger gains on harder benchmarks: streamcluster exceeds 20\times and kmeans reaches approximately 10\times. Across benchmarks, agents consistently rediscover known hardware optimization patterns without domain-specific training, and the best designs often do not originate from top-ranked ILP candidates, indicating that global optimization exposes improvements missed by sub-kernel search. These results establish agent scaling as a practical and effective axis for HLS optimization.

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization? | AI Navigate