Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning

arXiv cs.LG / 4/23/2026

📰 NewsModels & Research

Key Points

  • The paper addresses the quadratic assignment problem (QAP), an NP-hard task where existing heuristics and learning-based methods struggle to stay consistently competitive across diverse instances.
  • It introduces PLMA, a permutation-learning framework that improves deployment-time results using warm-started MCMC finetuning with short Markov chains to stay close to previously promising regions.
  • PLMA uses an additive energy-based model (EBM) designed to enable an O(1)-time 2-swap Metropolis–Hastings sampling step, making MCMC exploration more efficient over permutation space.
  • The EBM’s neural parameterization relies on a scalable cross-graph attention mechanism to capture interactions between facilities and locations in QAP instances.
  • Experiments show PLMA beating state-of-the-art baselines, including near-zero average optimality gap on QAPLIB, strong robustness on difficult Taixxeyy cases, and effectiveness for bandwidth minimization.

Abstract

The quadratic assignment problem (QAP) is a fundamental NP-hard task that poses significant challenges for both traditional heuristics and modern learning-based solvers. Existing QAP solvers still struggle to achieve consistently competitive performance across structurally diverse real-world instances. To bridge this performance gap, we propose PLMA, an innovative permutation learning framework. PLMA features an efficient warm-started MCMC finetuning procedure to enhance deployment-time performance, leveraging short Markov chains to anchor the adaptation to the promising regions previously explored. For rapid exploration via MCMC over the permutation space, we design an additive energy-based model (EBM) that enables an O(1)-time 2-swap Metropolis-Hastings sampling step. Moreover, the neural network used to parameterize the EBM incorporates a scalable and flexible cross-graph attention mechanism to model interactions between facilities and locations in the QAP. Extensive experiments demonstrate that PLMA consistently outperforms state-of-the-art baselines across various benchmarks. In particular, PLMA achieves a near-zero average optimality gap on QAPLIB, exhibits remarkably superior robustness on the notoriously difficult Taixxeyy instances, and also serves as an effective QAP solver in bandwidth minimization.