nvidia/NVILA-8B-HD-Video · Hugging Face

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsModels & Research

共有:

Key Points

NVILA-HD-Video is an 8B-parameter multimodal LLM capable of understanding and answering questions about videos up to 4K resolution and 1K frames.
It uses AutoGaze to reduce redundant video patches before running the ViT or LLM, achieving up to 100x token reduction and latency improvements of up to 19x for ViT and 10x for the LLM.
The model demonstrates improved performance on benchmarks such as VideoMME and achieves state-of-the-art results on the HLVid high-resolution long-form video benchmark.
The model is released for research and development only and is hosted on Hugging Face by Nvidia.

NVILA-HD-Video is a Multi-modal Large Language Model with 8B parameters that understands and answers questions about videos with up to 4K resolution and 1K frames.

Specifically, NVILA-HD-Video uses AutoGaze to reduce redundant patches in a video before running the ViT or LLM. Empirically, AutoGaze can reduce #tokens in in a video by up to 100x, reducing the latency of ViT/LLM by up to 19x/10x. This enables NVILA-HD-Video to efficiently scale to 4K-resolution, 1K-frame videos and achieve improved performance on benchmarks such as VideoMME and state-of-the-art performance on HLVid, a high-resolution long-form video benchmark proposed in this work as well.

This model is for research and development only.

submitted by /u/jacek2023
[link] [comments]

Data Augmentation Using GANs

Dev.to

Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

arXiv cs.RO

Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

arXiv stat.ML

Preference-Guided Debiasing for No-Reference Enhancement Image Quality Assessment

arXiv cs.CV

Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

arXiv stat.ML

nvidia/NVILA-8B-HD-Video · Hugging Face

Key Points

Related Articles

Data Augmentation Using GANs

Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

Preference-Guided Debiasing for No-Reference Enhancement Image Quality Assessment

Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer