ARIS: Agentic and Relationship Intelligence System for Social Robots

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces ARIS, an agentic AI framework for social robots that combines multimodal reasoning, a graph-based Social World Model, and retrieval-augmented generation (RAG) in a modular architecture.
  • ARIS focuses on overcoming key limitations of current social-robot systems, including multi-turn engagement, reasoning about social relationships, and contextually grounded dialogue at scale.
  • In experiments with a Pepper robot in dyadic, robot-mediated conversations, ARIS is compared to a large language model baseline and is shown to improve user-perceived outcomes.
  • A user study with 23 participants reports ARIS delivers significantly higher ratings for perceived intelligence, animacy, anthropomorphism, and likeability.
  • The work’s main contributions include explicit relationship tracking via a knowledge graph, an efficient RAG pipeline that keeps latency bounded as dialogue grows, and an integrated system coordinating speech, vision, and physical actions via structured APIs, with open-source release planned after publication.

Abstract

Foundational models have advanced social robotics, enabling richer perception and communicative interaction with users. However, current systems still struggle with multi-turn engagement, social-relationship reasoning, and contextually grounded dialogue at scale. We present ARIS (Agentic and Relationship Intelligence System), an agentic AI framework that unifies multimodal reasoning, a graph-based Social World Model, and retrieval-augmented generation (RAG) within a single modular architecture for social robots. We evaluate ARIS with the Pepper robot in a robot-mediated dyadic conversational setting, comparing it against a large language model baseline. A user study (N=23) shows that ARIS yields significantly higher perceived intelligence, animacy, anthropomorphism, and likeability. Our contributions are threefold: (1)~a Social World Model that explicitly maps and updates social relationships between users through a knowledge graph, enabling social reasoning and re-identification across encounters; (2)~an efficient RAG-based conversational pipeline that maintains bounded latency as dialogue histories grow to thousands of exchanges while preserving response relevance; and (3)~system integration and empirical validation of these components within a modular agentic architecture that coordinates speech, vision, and physical action through structured APIs. The implementation of ARIS will be released as open source upon publication.