StereoFoley: Object-Aware Stereo Audio Generation from Video

Apple Machine Learning Journal / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

StereoFoley is a video-to-audio generation framework designed to produce semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz.
The work targets a key limitation of recent video-to-audio generative models, which often output mono audio or lack object-aware stereo imaging due to insufficient professionally mixed, spatially accurate datasets.
The authors train a base stereo audio generation model from video, reporting state-of-the-art performance in semantic accuracy and audio-video synchronization.
The approach extends beyond basic generation toward object-aware stereo behavior, aiming to deliver more realistic spatial audio tied to scene elements.
The paper is positioned as research for ICASSP (April 2026 publication) and is disseminated via an arXiv preprint for broader access by the research community.

We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver object-aware stereo imaging, constrained by the lack of professionally mixed, spatially accurate video-to-audio datasets. First, we develop and train a base model that generates stereo audio from video, achieving state-of-the-art in both semantic accuracy and synchronization. Next…

Continue reading this article on the original site.

Read original →

Black Hat USA

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

MarkTechPost

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

Dev.to

StereoFoley: Object-Aware Stereo Audio Generation from Video

Key Points

Related Articles

Black Hat USA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer