Users and Wizards in Conversations: How WoZ Interface Choices Define Human-Robot Interactions

arXiv cs.RO / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper compares three Wizard-of-Oz (WoZ) interface designs with different constraints on what the wizard and robot can perceive and produce (restricted GUI, unrestricted GUI, and VR telepresence).
  • User evaluations show that the VR interface was preferred due to stronger robot-feature satisfaction and higher perceived social presence.
  • Wizard perspectives indicate that VR is the most demanding interface for operating the robot, yet it fosters a greater sense of social connection with users.
  • The study finds interface-dependent differences in conversational timing and turn-taking, with VR producing the most connected speech dynamics and the restricted GUI yielding the least connected flow with larger silences.
  • The authors argue that more WoZ experiments should use telepresence interfaces to better model future robots and to collect naturalistic contextual verbal and non-verbal data that can support automation.

Abstract

In this paper, we investigated how the choice of a Wizard-of-Oz (WoZ) interface affects communication with a robot from both the user's and the wizard's perspective. In a conversational setting, we used three WoZ interfaces with varying levels of dialogue input and output restrictions: a) a restricted perception GUI that showed fixed-view video and ASR transcripts and let the wizard trigger pre-scripted utterances and gestures; b) an unrestricted perception GUI that added real-time audio from the participant and the robot c) a VR telepresence interface that streamed immersive stereo video and audio to the wizard and forwarded the wizard's spontaneous speech, gaze and facial expressions to the robot. We found that the interaction mediated by the VR interface was preferred by users in terms of robot features and perceived social presence. For the wizards, the VR condition turned out to be the most demanding but elicited a higher social connection with the users. VR interface also induced the most connected interaction in terms of inter-speaker gaps and overlaps, while Restricted GUI induced the least connected flow and the largest silences. Given these results, we argue for more WoZ studies using telepresence interfaces. These studies better reflect the robots of tomorrow and offer a promising path to automation based on naturalistic contextualized verbal and non-verbal behavioral data.