Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies downlink mmWave multi-user beam and discrete rate adaptation in a MISO system with multiple base stations using finite analog beam codebooks and learning from ACK/NACK feedback.
  • It formulates the joint beam-rate selection as a combinatorial semi-bandit with a “satisficing” throughput threshold, aiming to meet a target quality-of-service rather than purely maximizing throughput.
  • The proposed SAT-CTS policy is lightweight and threshold-aware, combining conservative confidence estimates with posterior sampling to focus learning on satisfying the throughput requirement \tau_r.
  • The authors provide the first finite-time regret bounds for combinatorial semi-bandits under a satisficing objective, covering both realizable and non-realizable threshold cases and yielding a standard o((\log T)^2)-style bound when the threshold is not fully realizable.
  • Simulations on time-varying sparse multipath channels show SAT-CTS reduces satisficing regret to \tau_r while keeping competitive standard regret and improving fairness, enabling equitable QoS-aware allocation without explicit channel state information.

Abstract

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold \tau_r and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting \tau_r rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when \tau_r is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when \tau_r is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an O((\log T)^2) standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to \tau_r alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.