Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

arXiv cs.AI / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles cellular handover (HO) optimization by focusing on tuning Cell Individual Offsets (CIOs), which are traditionally set via heuristics but become tightly coupled at network scale.
  • It models HO optimization as a decentralized partially observable Markov decision process (Dec-POMDP) on the network’s dual graph, where each agent controls a CIO for a neighbor cell pair and uses locally aggregated KPI observations.
  • The authors introduce TD3-D-MA, a discrete multi-agent reinforcement learning approach that uses a shared-parameter GNN actor on the dual graph and region-wise double critics to improve credit assignment in dense deployments.
  • Experiments in an ns-3 system-level simulator with operator-like parameters across varied traffic regimes and network topologies show throughput gains over standard HO heuristics and centralized RL baselines.
  • The method demonstrates robustness and generalization under topology and traffic shifts, suggesting practical resilience compared to static rule-based tuning.

Abstract

HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.