Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis

arXiv cs.CV / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper tackles the high computational cost of running AI on uncompressed chest CT volumes by using JPEG-compressed CT data to enable low-resource deployment and faster transfer.
  • It proposes Feature Attention Style Transfer (FAST), a knowledge-distillation framework that transfers both activation/attention patterns and structural relationships from high-fidelity CT models to encoders trained on compressed inputs.
  • It introduces Structured Factorized Projection (SFP), a parameter-efficient projection-head approach using Block Tensor Train decomposition that cuts projection-head parameters by nearly half.
  • The authors combine FAST and SFP into a contrastive learning pipeline called CT-Lite with a SigLIP-based multimodal alignment objective, achieving AUROC within 5–7% of an uncompressed-input baseline on multiple CT datasets.
  • Overall, the results suggest that compressed CT can support accurate medical image analysis with substantially fewer parameters, improving feasibility for clinical settings with resource constraints.

Abstract

The deployment of artificial intelligence in medical imaging is hindered by high computational complexity and resource-intensive processing of volumetric data. Although chest computed tomography (CT) volumes offer richer diagnostic information than projection radiography, their use in AI-based diagnosis remains limited due to the computational burden of processing uncompressed volumetric images (typically stored in NIfTI or DICOM format). Addressing the growing need for low-resource deployment and efficient electronic data transfer, we investigate the utilization of JPEG-compressed chest CT volumes for thoracic abnormality detection. We propose Feature Attention Style Transfer (FAST), a novel distillation framework that transfers both activation patterns and structural relationships from high-fidelity CT representations to a spatiotemporal visual encoder operating on compressed inputs. By combining Gram-matrix-based attention style preservation with dual-attention feature alignment, FAST enables robust feature extraction from degraded volumes. Furthermore, we introduce Structured Factorized Projection (SFP), leveraging Block Tensor Train decomposition as a parameter-efficient alternative to dense projection layers, reducing projection-head parameters by almost half. Our contrastive learning pipeline, CT-Lite, integrates these components with a SigLIP-based multimodal alignment objective. Experiments on CT-RATE, NIDCH, and Rad-ChestCT demonstrate that CT-Lite achieves AUROC within 5-7\% of the uncompressed-input baseline across all three datasets, despite operating on compressed inputs with significantly fewer parameters, paving the way for AI-based clinical evaluation under resource constraints.