Unleashing Video Language Models for Fine-grained HRCT Report Generation

arXiv cs.CV / 3/16/2026

📰 NewsModels & Research

共有:

Key Points

AbSteering is an abnormality-centric framework that steers Video Language Models toward precise HRCT report generation, addressing the challenges of high-volume 3D imaging and diverse pathologies.
It combines an abnormality-centric Chain-of-Thought scheme with a Direct Preference Optimization objective that uses clinically confusable abnormalities as hard negatives to improve fine-grained discrimination.
The approach demonstrates that general-purpose VideoLMs can transfer effectively to medical imaging when guided by this paradigm, achieving strong performance in HRCT report generation.
It outperforms state-of-the-art domain-specific CT foundation models in detection sensitivity while reducing hallucinations, enhancing reliability for clinical reporting.
The authors release data and model weights at the provided link, enabling broader validation and reproduction.

Abstract

Generating precise diagnostic reports from High-Resolution Computed Tomography (HRCT) is critical for clinical workflow, yet it remains a formidable challenge due to the high pathological diversity and spatial sparsity within 3D volumes. While Video Language Models (VideoLMs) have demonstrated remarkable spatio-temporal reasoning in general domains, their adaptability to domain-specific, high-volume medical interpretation remains underexplored. In this work, we present AbSteering, an abnormality-centric framework that steers VideoLMs toward precise HRCT report generation. Specifically, AbSteering introduces: (i) an abnormality-centric Chain-of-Thought scheme that enforces abnormality reasoning, and (ii) a Direct Preference Optimization objective that utilizes clinically confusable abnormalities as hard negatives to enhance fine-grained discrimination. Our results demonstrate that general-purpose VideoLMs possess strong transferability to high-volume medical imaging when guided by this paradigm. Notably, AbSteering outperforms state-of-the-art domain-specific CT foundation models, which are pretrained with large-scale CTs, achieving superior detection sensitivity while simultaneously mitigating hallucinations. Our data and model weights are released at https://anonymous.4open.science/r/hrct-report-generation-video-vlm-728C/

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

Dev.to

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

MarkTechPost

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Dev.to

Tinybox- offline AI device 120B parameters

Hacker News

Unleashing Video Language Models for Fine-grained HRCT Report Generation

Key Points

Abstract

Related Articles

Two bots, one confused server: what Nimbus revealed about AI agent identity

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Tinybox- offline AI device 120B parameters

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer