With Nemotron 3 Nano Omni, Nvidia reveals what really goes into a modern multimodal model

THE DECODER / 4/29/2026

📰 NewsSignals & Early TrendsIndustry & Market MovesModels & Research

Key Points

  • Nvidia has released Nemotron 3 Nano Omni, an open multimodal model that supports text, images, video, and audio inputs.
  • The release highlights strong performance, positioning the model as a practical option for modern multimodal workloads.
  • Nvidia also provides visibility into the model’s training data sources, naming datasets and/or contributors including Qwen, GPT-OSS, Kimi, and DeepSeek OCR.
  • By sharing what goes into training, the article frames the release as an insight into how contemporary multimodal models are assembled beyond just benchmark results.

Nvidia releases Nemotron 3 Nano Omni, an open multimodal model for text, image, video and audio. Not only the performance is exciting, but also a look at the training data: it comes from Qwen, GPT-OSS, Kimi and DeepSeek OCR, among others.

The article With Nemotron 3 Nano Omni, Nvidia reveals what really goes into a modern multimodal model appeared first on The Decoder.