ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

arXiv cs.AI / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

ProKWS introduces a dual-stream encoder that jointly learns phonemic representations and speaker-specific prosodic patterns, using a collaborative fusion module to combine both modalities.
The phoneme stream employs contrastive learning to enhance phonemic representations, while the prosody stream captures individual-speaking characteristics such as tone, stress, and rhythm.
The approach aims to improve adaptability across different acoustic environments and personalize keyword spotting for tone and intent variations.
Experiments indicate competitive performance with state-of-the-art models on standard benchmarks and robust handling of personalized keywords across diverse prosodic expressions.

Abstract

Current keyword spotting systems primarily use phoneme-level matching to distinguish confusable words but ignore user-specific pronunciation traits like prosody (intonation, stress, rhythm). This paper presents ProKWS, a novel framework integrating fine-grained phoneme learning with personalized prosody modeling. We design a dual-stream encoder where one stream derives robust phonemic representations through contrastive learning, while the other extracts speaker-specific prosodic patterns. A collaborative fusion module dynamically combines phonemic and prosodic information, enhancing adaptability across acoustic environments. Experiments show ProKWS delivers highly competitive performance, comparable to state-of-the-art models on standard benchmarks and demonstrates strong robustness for personalized keywords with tone and intent variations.

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Dev.to

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Dev.to

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Key Points

Abstract

Related Articles

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer