Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards

arXiv cs.AI / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that while LLM-based smart home assistants can handle real-time device control, reliably performing memory-driven device control remains difficult to evaluate and optimize.
  • It identifies limitations of existing benchmarks (they typically test either immediate control or generic memory retrieval) as well as RL training methods that provide only outcome-based supervision.
  • The authors propose using reinforcement learning with multi-dimensional rewards to deliver more intermediate feedback for fine-grained memory operations such as add/update/delete/use.
  • To support this, they release two resources: MemHomeLife, created from anonymized real-world long-term user interaction logs, and MemHome, a benchmark specifically for systematic evaluation of memory-driven device control.
  • The work targets better assessment and training of memory management behaviors in smart home scenarios, aiming to reduce local failures and improve overall fine-grained performance.

Abstract

Large Language Models (LLMs) have become a key foundation for enabling personalized smart home experiences. While existing studies have explored how smart home assistants understand user queries to control devices in real time, their ability to perform memory-driven device control remains challenging from both evaluation and methodological perspectives. In terms of evaluation, existing benchmarks either focus on immediate device control or general open-domain memory retrieval tasks, and therefore cannot effectively evaluate a model's ability to perform memory-driven device control. Methodologically, while memory-driven device control can be approached using Reinforcement Learning, conventional RL methods generally rely on outcome-based supervision (i.e., whether the final task is achieved). This lack of intermediate feedback can lead to sub-optimal performance or local failures in fine-grained memory management tasks (adding, updating, deleting, and utilizing). To address these issues, we first release MemHomeLife, built from anonymized real-world long-term user interaction logs. To enable more fine-grained evaluation of different memory-related subtasks, we further construct MemHome, the first benchmark designed to systematically evaluate memory-driven device control in smart home scenarios.