Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that long-horizon LLM agents fail due to two distinct problems: global Progress Drift (semantic planning wandering) and local Feasibility Violation (breaking logical constraints or invalid state transitions).
It proposes a Neuro-Symbolic Dual Memory Framework that decouples these issues by running two memory mechanisms in parallel during inference.
A neural Progress Memory learns semantic “blueprints” from successful trajectories to steer overall task advancement, while a symbolic Feasibility Memory performs strict logical validation using executable Python verification functions generated from failures.
Experiments on ALFWorld, WebShop, and TextCraft show improved performance versus baselines, with reduced invalid action rate and shorter/controlled trajectory length.

Abstract

Large language models (LLMs) have demonstrated strong potential in long-horizon decision-making tasks, such as embodied manipulation and web interaction. However, agents frequently struggle with endless trial-and-error loops or deviate from the main objective in complex environments. We attribute these failures to two fundamental errors: global Progress Drift and local Feasibility Violation. Existing methods typically attempt to address both issues simultaneously using a single paradigm. However, these two challenges are fundamentally distinct: the former relies on fuzzy semantic planning, while the latter demands strict logical constraints and state validation. The inherent limitations of such a single-paradigm approach pose a fundamental challenge for existing models in handling long-horizon tasks. Motivated by this insight, we propose a Neuro-Symbolic Dual Memory Framework that explicitly decouples semantic progress guidance from logical feasibility verification. Specifically, during the inference phase, the framework invokes both memory mechanisms synchronously: on one hand, a neural-network-based Progress Memory extracts semantic blueprints from successful trajectories to guide global task advancement; on the other hand, a symbolic-logic-based Feasibility Memory utilizes executable Python verification functions synthesized from failed transitions to perform strict logical validation. Experiments demonstrate that this method significantly outperforms existing competitive baselines on ALFWorld, WebShop, and TextCraft, while drastically reducing the invalid action rate and average trajectory length.