Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a “CSI-free” control framework for reconfigurable intelligent surfaces (RIS) that avoids costly channel state information (CSI) estimation by using user localization data instead.
  • It introduces a hierarchical multi-agent reinforcement learning (HMARL) architecture with a two-tier design: a high-level controller for discrete user-to-reflector allocation and low-level MAPPO agents for continuous focal-point optimization under CTDE.
  • Deterministic ray-tracing results show up to 7.79 dB improvements in received signal strength (RSSI) compared with centralized optimization baselines.
  • The approach is evaluated as robust to multi-user scaling and resilient to realistic sub-meter localization tracking errors, maintaining strong beam-focusing performance.
  • By reducing CSI-related computational overhead while preserving high-fidelity signal redirection, the work frames a scalable and cost-effective blueprint for intelligent wireless environments.

Abstract

Reconfigurable Intelligent Surfaces (RIS) has a potential to engineer smart radio environments for next-generation millimeter-wave (mmWave) networks. However, the prohibitive computational overhead of Channel State Information (CSI) estimation and the dimensionality explosion inherent in centralized optimization severely hinder practical large-scale deployments. To overcome these bottlenecks, we introduce a ``CSI-free" paradigm powered by a Hierarchical Multi-Agent Reinforcement Learning (HMARL) architecture to control mechanically reconfigurable reflective surfaces. By substituting pilot-based channel estimation with accessible user localization data, our framework leverages spatial intelligence for macro-scale wave propagation management. The control problem is decomposed into a two-tier neural architecture: a high-level controller executes temporally extended, discrete user-to-reflector allocations, while low-level controllers autonomously optimize continuous focal points utilizing Multi-Agent Proximal Policy Optimization (MAPPO) under a Centralized Training with Decentralized Execution (CTDE) scheme. Comprehensive deterministic ray-tracing evaluations demonstrate that this hierarchical framework achieves massive RSSI improvements of up to 7.79 dB over centralized baselines. Furthermore, the system exhibits robust multi-user scalability and maintains highly resilient beam-focusing performance under practical sub-meter localization tracking errors. By eliminating CSI overhead while maintaining high-fidelity signal redirection, this work establishes a scalable and cost-effective blueprint for intelligent wireless environments.