Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans

arXiv cs.LG / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

TuringHotel reimagines the Turing Test as a symmetric, distributed setup where humans and LLMs both serve as judges and respondents.
The experiment runs on the UNaIVERSE platform with a peer-to-peer network enabling time-bounded discussions and authenticated exchanges, involving 17 human participants and 19 LLMs.
Findings indicate current models can still be mistaken for humans, and human fingerprints remain detectable but not unambiguous, underscoring ongoing challenges in differentiating AI from people.
The authors claim this is the first distributed-setup Turing Test of its kind and suggest potential national-interest uses for monitoring the evolution of large language models over time.

Abstract

In this paper, we report our experience with ``TuringHotel'', a novel extension of the Turing Test based on interactions within mixed communities of Large Language Models (LLMs) and human participants. The classical one-to-one interaction of the Turing Test is reinterpreted in a group setting, where both human and artificial agents engage in time-bounded discussions and, interestingly, are both judges and respondents. This community is instantiated in the novel platform UNaIVERSE (https://unaiverse.io), creating a ``World'' which defines the roles and interaction dynamics, facilitated by the platform's built-in programming tools. All communication occurs over an authenticated peer-to-peer network, ensuring that no third parties can access the exchange. The platform also provides a unified interface for humans, accessible via both mobile devices and laptops, that was a key component of the experience in this paper. Results of our experimentation involving 17 human participants and 19 LLMs revealed that current models are still sometimes confused as humans. Interestingly, there are several unexpected mistakes, suggesting that human fingerprints are still identifiable but not fully unambiguous, despite the high-quality language skills of artificial participants. We argue that this is the first experiment conducted in such a distributed setting, and that similar initiatives could be of national interest to support ongoing experiments and competitions aimed at monitoring the evolution of large language models over time.

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

Teleport Just Pivoted to AI Agent Identity. VentureBeat Mapped the Governance Gap They Are Filling.

Dev.to

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

Towards Data Science

OpenAI is throwing everything into building a fully automated researcher

MIT Technology Review

v1.82.3.dev.2

LiteLLM Releases

Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans

Key Points

Abstract

Related Articles

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Teleport Just Pivoted to AI Agent Identity. VentureBeat Mapped the Governance Gap They Are Filling.

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

OpenAI is throwing everything into building a fully automated researcher

v1.82.3.dev.2

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer