Middle-mile logistics through the lens of goal-conditioned reinforcement learning

arXiv stat.ML / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses middle-mile logistics by routing parcels through a hub-and-truck network with finite truck capacity.
  • It reformulates the logistics problem as a multi-objective, goal-conditioned Markov Decision Process (MDP) to handle different targets during routing.
  • The proposed approach integrates graph neural networks (GNNs) with model-free reinforcement learning (RL), using compact feature graphs derived from the environment state.
  • The work presents arXiv:2605.02461v1 as a new announcement, aiming to learn routing policies that respect network and capacity constraints.

Abstract

Middle-mile logistics describes the problem of routing parcels through a network of hubs linked by trucks with finite capacity. We rephrase this as a multi-object goal-conditioned MDP. Our method combines graph neural networks with model-free RL, extracting small feature graphs from the environment state.