Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies downlink mmWave multi-user beam and discrete rate adaptation in a MISO system with multiple base stations using finite analog beam codebooks and learning from ACK/NACK feedback.
It formulates the joint beam-rate selection as a combinatorial semi-bandit with a “satisficing” throughput threshold, aiming to meet a target quality-of-service rather than purely maximizing throughput.
The proposed SAT-CTS policy is lightweight and threshold-aware, combining conservative confidence estimates with posterior sampling to focus learning on satisfying the throughput requirement \tau_r.
The authors provide the first finite-time regret bounds for combinatorial semi-bandits under a satisficing objective, covering both realizable and non-realizable threshold cases and yielding a standard o((\log T)^2)-style bound when the threshold is not fully realizable.
Simulations on time-varying sparse multipath channels show SAT-CTS reduces satisficing regret to \tau_r while keeping competitive standard regret and improving fairness, enabling equitable QoS-aware allocation without explicit channel state information.

Abstract

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold

\tau_r

and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting

\tau_r

rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when

\tau_r

is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when

\tau_r

is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an

O((\log T)^2)

standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to

\tau_r

alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.