DeepFleet: Multi-Agent Foundation Models for Mobile Robots

arXiv cs.RO / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DeepFleet proposes a suite of foundation models for coordinating and planning large-scale mobile robot fleets, trained on warehouse fleet movement data from hundreds of thousands of robots at Amazon.
The work explores four model architectures with different inductive biases: robot-centric decision transformer neighborhoods, robot-floor cross-attention to the warehouse floor, image-floor convolutional encoding of fleet state as multi-channel images, and graph-floor temporal attention combined with graph neural networks.
Evaluation examines how architectural design choices affect prediction performance across tasks, showing robot-centric and graph-floor approaches as the most promising due to asynchronous updates and localized interaction structure.
Scaling experiments indicate that the robot-centric and graph-floor models benefit from larger warehouse operation datasets, improving effectiveness as data and model size increase.

Abstract

We introduce DeepFleet, a suite of foundation models designed to support coordination and planning for large-scale mobile robot fleets. These models are trained on fleet movement data, including robot positions, goals, and interactions, from hundreds of thousands of robots in Amazon warehouses worldwide. DeepFleet consists of four architectures that each embody a distinct inductive bias and collectively explore key points in the design space for multi-agent foundation models: the robot-centric (RC) model is an autoregressive decision transformer operating on neighborhoods of individual robots; the robot-floor (RF) model uses a transformer with cross-attention between robots and the warehouse floor; the image-floor (IF) model applies convolutional encoding to a multi-channel image representation of the full fleet; and the graph-floor (GF) model combines temporal attention with graph neural networks for spatial relationships. In this paper, we describe these models and present our evaluation of the impact of these design choices on prediction task performance. We find that the robot-centric and graph-floor models, which both use asynchronous robot state updates and incorporate the localized structure of robot interactions, show the most promise. We also present experiments that show that these two models can make effective use of larger warehouses operation datasets as the models are scaled up.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/14DailyView insight →

Black Hat Asia

AI Business

From Hype to Hyperproductivity: How Boomi Agentstudio Turns Experimental AI Agents into Real-World Powerhouses

Dev.to

Choosing the Right Voice: A Technical Comparison of Pocket Studio Models

Dev.to

Agent Diary: Apr 15, 2026 - The Day I Became a Living Workflow Witness (While Run 241 Writes This Very Entry)

Dev.to

I Ran 163 Benchmarks Across 10 LLMs So You Don't Have To. Here's What I Found

Dev.to

DeepFleet: Multi-Agent Foundation Models for Mobile Robots

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

From Hype to Hyperproductivity: How Boomi Agentstudio Turns Experimental AI Agents into Real-World Powerhouses

Choosing the Right Voice: A Technical Comparison of Pocket Studio Models

Agent Diary: Apr 15, 2026 - The Day I Became a Living Workflow Witness (While Run 241 Writes This Very Entry)

I Ran 163 Benchmarks Across 10 LLMs So You Don't Have To. Here's What I Found

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer