Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

arXiv cs.CV / 4/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces eSEC-LAM, a neuro-symbolic framework that turns enriched Semantic Event Chains (eSECs) into explicit event-level symbolic states for manipulation understanding in robotics.
  • It augments classical eSECs with confidence-aware predicates, functional object roles, affordance priors, primitive-level abstraction, and saliency-guided explanation cues to enable uncertainty-aware reasoning.
  • The system uses a foundation-model-based perception front-end to extract deterministic predicates, then performs current-action inference and next-primitive prediction via lightweight symbolic reasoning over primitive pre/post-conditions.
  • Experiments on EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101 show competitive action recognition, substantially better next-primitive prediction, improved robustness to perception noise, and temporally consistent, evidence-grounded explanation traces.

Abstract

Robotic systems operating in human environments must reason about how object interactions evolve over time, which actions are currently being performed, and what manipulation step is likely to follow. Classical enriched Semantic Event Chains (eSECs) provide an interpretable relational description of manipulation, but remain primarily descriptive and do not directly support uncertainty-aware decision making. In this paper, we propose eSEC-LAM, a neuro-symbolic framework that transforms eSECs into an explicit event-level symbolic state for manipulation understanding. The proposed formulation augments classical eSECs with confidence-aware predicates, functional object roles, affordance priors, primitive-level abstraction, and saliency-guided explanation cues. These enriched symbolic states are derived from a foundation-model-based perception front-end through deterministic predicate extraction, while current-action inference and next-primitive prediction are performed using lightweight symbolic reasoning over primitive pre- and post-conditions. We evaluate the proposed framework on EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101 across action recognition, next-primitive prediction, robustness to perception noise, and explanation consistency. Experimental results show that eSEC-LAM achieves competitive action recognition, substantially improves next-primitive prediction, remains more robust under degraded perceptual conditions than both classical symbolic and end-to-end video baselines, and provides temporally consistent explanation traces grounded in explicit relational evidence. These findings demonstrate that enriched Semantic Event Chains can serve not only as interpretable descriptors of manipulation, but also as effective internal states for neuro-symbolic action reasoning.