Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

arXiv cs.CL / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces Hierarchical Policy Optimization (HPO) for simultaneous speech translation (SST) that optimizes both translation quality and low-latency behavior.
It addresses the high compute cost of LLM-based SST by building on dialogue-style SST that reuses the LLM’s KV cache, reducing redundant computation.
Unlike prior dialogue reformulations that depend on scarce, high-quality supervised fine-tuning (SFT) annotations, HPO post-trains from imperfect SFT data using a hierarchical reward scheme.
Experiments for English→Chinese/German/Japanese report improvements of over +7 COMET and +1.25 MetricX at a target latency of 1.5 seconds, supported by ablation studies.
The authors provide code at GitHub, enabling reproducibility and further development of the HPO approach for SST.
Point 1
Point 2
Point 3

Abstract

Simultaneous speech translation (SST) generates translations while receiving partial speech input. Recent advances show that large language models (LLMs) can substantially improve SST quality, but at the cost of high computational overhead. To reduce this cost, prior work reformulates SST as a multi-turn dialogue task, enabling full reuse of the LLM's key-value (KV) cache and eliminating redundant feature recomputation. However, this approach relies on supervised fine-tuning (SFT) data in dialogue form, for which few human annotations exist, and existing synthesis methods cannot guarantee data quality. In this work, we propose a Hierarchical Policy Optimization (HPO) approach that post-train models trained on imperfect SFT data. We introduce a hierarchical reward that balances translation quality and latency objectives. Experiments on English to Chinese/German/Japanese demonstrate improvements of over +7 COMET score and +1.25 MetricX score at a latency of 1.5 seconds. Comprehensive ablation studies further validate the effectiveness of different quality rewards, hierarchical reward formulations, and segmentation strategies. Code can be found here https://github.com/owaski/HPO

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

Key Points

Abstract

Related Articles

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer