How Can Reinforcement Learning Achieve Expert-level Placement?

arXiv cs.AI / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that RL-based chip placement often underperforms experts because its training rewards typically target wirelength optimization rather than the full set of implicit design objectives.
  • It proposes a reward-modeling approach that learns expert-quality guidance by starting from final expert placement layouts and inferring step-by-step expert trajectories.
  • The inferred trajectories are then used as demonstrations or preference signals to train a model that captures the latent rewards underlying expert results.
  • Experiments indicate the framework can learn efficiently from very limited data (even a single design) and generalize to new, unseen placement cases.
  • Overall, the work reframes reward design as the key bottleneck and provides a practical alternative to explicitly hand-coding complex placement processes.

Abstract

Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.