Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids
arXiv cs.RO / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a deep reinforcement learning framework for maritime Coverage Path Planning (CPP) using hexagonal grid representations of irregular regions such as coastlines, islands, and exclusion zones.
- It reformulates CPP as a neural combinatorial optimization problem where a Transformer-based pointer policy autoregressively constructs coverage tours.
- To stabilize long-horizon routing without a value function, the authors propose a critic-free Group-Relative Policy Optimization (GRPO) approach that computes advantages via within-instance comparisons of sampled trajectories.
- Experiments on 1,000 unseen synthetic maritime environments report a 99.0% Hamiltonian success rate, outperforming the best heuristic (46.0%), alongside shorter paths and fewer heading changes versus baselines.
- The method supports multiple inference modes (greedy, stochastic, and sampling with 2-opt refinement) with reported runtimes under 50 ms per instance on a laptop GPU, suggesting real-time onboard feasibility.



