convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp

Reddit r/LocalLLaMA / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The ggml-org/llama.cpp repository has a pull request that adds conversion support for NVIDIA’s multimodal model “Nemotron 3 Nano Omni.”
  • Nemotron 3 Nano Omni is designed to unify understanding of video, audio, images, and text for enterprise workflows such as Q&A, summarization, transcription, and document intelligence.
  • The model includes capabilities such as integrated video-plus-speech comprehension, a GUI, OCR, and speech transcription for end-to-end processing of rich content (e.g., meetings and training videos, business documents).
  • The post notes that Nemotron 3 Nano Omni is available for commercial use and that improvements involved training using several Qwen and GPT-Oss models.
  • The change primarily affects practitioners using llama.cpp locally (e.g., to run or deploy the model), by enabling easier model conversion and usage in that ecosystem.
convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp

https://huggingface.co/ggml-org/NVIDIA-Nemotron-3-Nano-Omni

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows. It extends the Nemotron Nano family with integrated video+speech comprehension, Graphical User Interface (GUI), Optical Character Recognition (OCR), and speech transcription capabilities, enabling end-to-end processing of rich enterprise content such as meeting recordings, M&E assets, training videos, and complex business documents. NVIDIA Nemotron 3 Nano Omni was developed by NVIDIA as part of the Nemotron model family.

This model is available for commercial use.

This model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. For more information, please see the Training Dataset section below.

submitted by /u/jacek2023
[link] [comments]