Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

arXiv cs.CL / 4/28/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces “Human-1,” an open, reproducible full-duplex spoken dialogue system for Hindi, designed to handle realistic conversation phenomena like interruptions, overlaps, and backchannels.
It builds on the Moshi duplex speech architecture by adding a custom Hindi tokenizer and training with 26,000 hours of real spontaneous conversations from 14,695 speakers, using separate speaker channels to learn turn-taking and overlap patterns directly.
For Hindi text generation, the authors replace the original English tokenizer and reinitialize text-vocabulary-dependent parameters while keeping the pre-trained audio components.
The training approach uses a two-stage recipe—large-scale pre-training followed by fine-tuning on 1,000 hours of conversational data.
Experiments using prompted dialogue continuation show, via both automatic metrics and human evaluations, that the model produces natural, meaningful full-duplex conversational behavior in Hindi and aims to extend this to other Indian languages.

Abstract

Full-duplex spoken dialogue systems can model natural conversational behaviours such as interruptions, overlaps, and backchannels, yet such systems remain largely unexplored for Indian languages. We present the first open, reproducible full-duplex spoken dialogue system for Hindi by adapting Moshi, a state-of-the-art duplex speech architecture, using a custom Hindi tokeniser and training on 26,000 hours of real spontaneous conversations collected from 14,695 speakers with separate speaker channels, enabling direct learning of turn-taking and overlap patterns from natural interactions. To support Hindi text generation, we replace the original English tokeniser and reinitialise text-vocabulary-dependent parameters while retaining the pre-trained audio components. We propose a two-stage training recipe -- large-scale pre-training followed by fine-tuning on 1,000 hours of conversational data. Evaluation through the prompted dialogue continuation paradigm with both automatic metrics and human judgments demonstrates that the resulting model generates natural and meaningful full-duplex conversational behaviour in Hindi. This work serves as a first step toward real-time duplex spoken dialogue systems for Hindi and other Indian languages.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Dev.to

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

Reddit r/MachineLearning

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

Dev.to

An improvement of the convergence proof of the ADAM-Optimizer

Dev.to

Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

An improvement of the convergence proof of the ADAM-Optimizer

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer