CoFL: Continuous Flow Fields for Language-Conditioned Navigation

arXiv cs.RO / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces CoFL, an end-to-end policy for language-conditioned navigation that outputs a continuous flow field from BEV observations and a language instruction.
Instead of predicting trajectories from a single start point, CoFL learns local motion vectors at arbitrary BEV locations, using each scene-instruction annotation as dense spatial supervision.
The approach generates trajectories from any starting position by numerically integrating the predicted flow field, supporting simple real-time rollouts and closed-loop recovery.
To scale training and evaluation, the authors build a dataset of 500k+ BEV image–instruction pairs with procedurally generated flow fields and trajectories derived from semantic maps from Matterport3D and ScanNet.
Experiments on strictly unseen scenes show CoFL outperforms modular vision-language planners and trajectory-generation policies in both precision and safety, and it also performs zero-shot in real-world tests with feasible closed-loop control.

Abstract

Existing language-conditioned navigation systems typically rely on modular pipelines or trajectory generators, but the latter use each scene--instruction annotation mainly to supervise one start-conditioned rollout. To address these limitations, we present CoFL, an end-to-end policy that maps a bird's-eye view (BEV) observation and a language instruction to a continuous flow field for navigation. CoFL reformulates navigation as workspace-conditioned field learning rather than start-conditioned trajectory prediction: it learns local motion vectors at arbitrary BEV locations, turning each scene--instruction annotation into dense spatial control supervision. Trajectories are generated from any start by numerical integration of the predicted field, enabling simple real-time rollout and closed-loop recovery. To enable large-scale training and evaluation, we build a dataset of over 500k BEV image--instruction pairs, each procedurally annotated with a flow field and a trajectory derived from semantic maps built on Matterport3D and ScanNet. Evaluating on strictly unseen scenes, CoFL significantly outperforms modular Vision-Language Model (VLM)-based planners and trajectory generation policies in both navigation precision and safety, while maintaining real-time inference. Finally, we deploy CoFL zero-shot in real-world experiments with BEV observations across multiple layouts, maintaining feasible closed-loop control and a high success rate.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/30DailyView insight →

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

SCMP Tech

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

Dev.to

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Dev.to

CoFL: Continuous Flow Fields for Language-Conditioned Navigation

Key Points

Abstract

💡 Insights using this article

Related Articles

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer