A hierarchical spatial-aware algorithm with efficient reinforcement learning for human-robot task planning and allocation in production

arXiv cs.AI / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets human-robot task planning and allocation (TPA) in advanced manufacturing, where spatial factors like real-time human position and travel distance make TPA difficult in dynamic environments.
  • It decomposes production work into sequential subtasks and uses a hierarchical approach with a high-level planner plus a low-level allocator.
  • For high-level planning, it proposes an efficient buffer-based deep Q-learning (EBQ) method intended to cut training time and better handle long-term, sparse rewards.
  • For low-level allocation, it introduces a spatially aware path-planning method (SAP) to assign tasks to the right human-robot resources based on navigation feasibility and sequencing.
  • Experiments in a complex 3D real-time production simulator show that the combined EBQ&SAP approach can effectively solve TPA under complex and dynamic conditions.

Abstract

In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans' real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human-robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human-robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&SAP method effectively addresses human-robot TPA problems in complex and dynamic production processes.