HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos

arXiv cs.CV / 4/27/2026

📰 NewsModels & Research

Key Points

  • The paper introduces HFS-TriNet, a three-branch collaborative feature learning network designed to classify prostate cancer using transrectal ultrasound (TRUS) videos.
  • To tackle redundancy and computational cost in video inputs, it uses a heuristic frame selection (HFS) strategy that samples training clips at intervals and dynamically sets clip start points so the clips cover the full sequence.
  • For robust feature extraction despite high intra-/inter-class similarity and noisy signals, the model combines a regular ResNet50 branch with a SAM-based large-model branch (plus a normalization-based attention mechanism for temporal consistency).
  • It further adds a WTCR branch that leverages wavelet transform convolutional residual learning to capture high-frequency lesion-edge cues while performing denoising in the low-frequency domain.
  • Overall, the approach targets key TRUS video challenges—redundancy, similarity, and low signal-to-noise—by fusing spatial, semantic, temporal, and frequency-domain information.

Abstract

Transrectal ultrasound (TRUS) imaging is a cost-effective and non-invasive modality widely used in the diagnosis of prostate cancer. The computer-aided diagnosis (CAD) relying on TRUS images has been extensively investigated recently. Compared to static images, TRUS video provides richer spatial-temporal information, which make it a promising alternative for improving the accuracy and robustness of CAD systems. However, TRUS video analysis also introduces new challenges. These include information redundancy, which increases computational costs; high intra- and inter-class similarity, which complicates feature extraction; and a low signal-to-noise ratio, which hinders the identification of clinically relevant information. To address these problems, we propose a heuristic frame selection (HFS) and a three-branch collaborative feature learning network (HFS-TriNet) for prostate cancer classification from TRUS videos. Specifically, selecting a clip of video frames at intervals for training can mitigate redundancy. The HFS strategy dynamically initializes the starting point of each training clip, which ensures that the sampled clips span the entire video sequence. For better feature extraction, besides a regular ResNet50 branch, we also utilize 1) a large model branch based a pre-trained medical segment anything model (SAM) to extract deep features of each frame and a normalization-based attention module to explore the temporal consistency; and 2) a wavelet transform convolutional residual (WTCR) branch that extracts lesion edge information in the high-frequency domain and performs denoising in the low-frequency domain.