StereoFoley: Object-Aware Stereo Audio Generation from Video
Apple Machine Learning Journal / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- StereoFoley is a video-to-audio generation framework designed to produce semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz.
- The work targets a key limitation of recent video-to-audio generative models, which often output mono audio or lack object-aware stereo imaging due to insufficient professionally mixed, spatially accurate datasets.
- The authors train a base stereo audio generation model from video, reporting state-of-the-art performance in semantic accuracy and audio-video synchronization.
- The approach extends beyond basic generation toward object-aware stereo behavior, aiming to deliver more realistic spatial audio tied to scene elements.
- The paper is positioned as research for ICASSP (April 2026 publication) and is disseminated via an arXiv preprint for broader access by the research community.
We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver object-aware stereo imaging, constrained by the lack of professionally mixed, spatially accurate video-to-audio datasets. First, we develop and train a base model that generates stereo audio from video, achieving state-of-the-art in both semantic accuracy and synchronization. Next…
Continue reading this article on the original site.
Read original →Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI
MarkTechPost
AI 编程工具对比 2026:Claude Code vs Cursor vs Gemini CLI vs Codex
Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools
Dev.to