CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution

arXiv cs.CV / 3/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

CAFlow introduces an adaptive-depth single-step flow-matching framework for efficient histopathology super-resolution by routing each image tile to the shallowest network exit while preserving quality.
It operates in a pixel-unshuffled rearranged space to cut spatial computation by 16x and enables direct, faster inference on whole-slide images.
The model backbone FlowResNet has 1.90M parameters with four exits, and an exit classifier adds about 6K parameters, achieving compute savings of ~33% at a modest 0.12 dB quality cost.
On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR (vs 31.84 dB at full depth) and the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light, while generalizing to held-out colon tissue with minimal loss.
At x8 upscaling it outperforms all comparable-compute baselines and remains competitive with SwinIR-Medium; downstream nuclei segmentation confirms preservation of clinically relevant structure and training completes in under 5 hours on a single GPU, with inference from minutes to seconds on whole slides.

Abstract

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.