mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)

Reddit r/LocalLLaMA / 4/13/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The post reports working audio support for Qwen3 variants in llama.cpp, specifically qwen3-omni (with vision + audio input) and qwen3-asr.
  • Functionality is demonstrated via an implementation referenced as a llama.cpp pull request, indicating the feature is being integrated upstream.
  • The update targets local/bring-your-own-model workflows (“LocalLLaMA”), enabling developers to experiment with multimodal audio capabilities on-device.
  • It suggests improving readiness for real-time or interactive audio-to-understanding pipelines using Qwen3-based models in the llama.cpp ecosystem.
mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)
  • qwen3-omni-moe working (vision + audio input)
  • qwen3-asr working
submitted by /u/jacek2023
[link] [comments]