Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles the challenge of task offloading in mobile edge computing under latency constraints caused by dynamic arrivals, time-varying wireless channels, and coupled server queues.
  • It argues that existing heuristics are not adaptive and that DRL approaches can generalize poorly and require retraining when network topology changes.
  • It proposes COMLLM, a generative LLM-based framework that uses GRPO plus a Look-Ahead Collaborative Simulation (LACS) mechanism to perform multi-step Monte Carlo rollouts that jointly model queue evolution.
  • By embedding the rollout-based look-ahead into the reward design, COMLLM aims to produce foresighted policies rather than myopically optimizing immediate latency.
  • Experiments report near-optimal latency with better load-balancing fairness and “zero-shot topological scalability,” showing generalization to larger unseen network topologies without retraining and outperforming SFT, DRL, and heuristic baselines.

Abstract

Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.