AI Navigate

Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation

arXiv cs.CV / 3/18/2026

📰 NewsModels & Research

Key Points

  • The study reveals a bottleneck in 3D CT embeddings: highly discriminative pathology signals exist but are limited to a very small effective dimensionality (as few as 2 of 512), constraining both generation and retrieval.
  • Scaling the language model does not improve performance, suggesting the bottleneck lies in the visual representation rather than in the text generator.
  • The authors propose AdaRAG-CT, an adaptive augmentation framework that injects supplementary textual information through controlled retrieval and selectively fuses it during report generation to mitigate the bottleneck.
  • On the CT-RATE benchmark, AdaRAG-CT delivers state-of-the-art clinical efficacy, raising Clinical F1 from 0.420 to 0.480, with ablations showing both retrieval and generation components contribute, and the authors provide code at the given GitHub URL; naive static retrieval can degrade performance.

Abstract

Automated radiology report generation from 3D CT volumes often suffers from incomplete pathology coverage. We provide empirical evidence that this limitation stems from a representational bottleneck: contrastive 3D CT embeddings encode discriminative pathology signals, yet exhibit severe dimensional concentration, with as few as 2 effective dimensions out of 512. Corroborating this, scaling the language model yields no measurable improvement, suggesting that the bottleneck lies in the visual representation rather than the generator. This bottleneck limits both generation and retrieval; naive static retrieval fails to improve clinical efficacy and can even degrade performance. We propose \textbf{AdaRAG-CT}, an adaptive augmentation framework that compensates for this visual bottleneck by introducing supplementary textual information through controlled retrieval and selectively integrating it during generation. On the CT-RATE benchmark, AdaRAG-CT achieves state-of-the-art clinical efficacy, improving Clinical F1 from 0.420 (CT-Agent) to 0.480 (+6 points); ablation studies confirm that both the retrieval and generation components contribute to the improvement. Code is available at https://github.com/renjie-liang/Adaptive-RAG-for-3DCT-Report-Generation.