Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

THE DECODER / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that can process text, images, audio, and video in a single system.
The model is positioned as outperforming Gemini 3.1 Pro on audio-related tasks, according to the article.
A notable capability is that Qwen3.5-Omni can generate code from spoken instructions and video inputs, even though no one explicitly trained it for code-writing from those modalities.
The release highlights a broader trend of multimodal models exhibiting emergent abilities across modalities without narrowly targeted training.

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. It claims to beat Gemini 3.1 Pro on audio tasks and picked up an unexpected trick along the way: writing code from spoken instructions and video input.

The article Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to appeared first on The Decoder.