Classification of systolic murmurs in heart sounds using multiresolution complex Gabor dictionary and vision transformer

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • The study introduces an automatic system to classify systolic heart murmurs by leveraging multiresolution time–frequency representations and transformer-based learning.
  • Feature extraction uses a redundant dictionary of multiresolution complex Gabor basis functions, with complex orthogonal matching pursuit projecting one or multiple murmur segments onto that dictionary.
  • For multiple segments from a single recording, the method shares the same dictionary and constrains segment-specific weights to align with the same basis functions, producing consistent variable-resolution time–frequency feature matrices.
  • The classification stage employs a vision transformer that ingests multiple input matrices of different resolutions via CNN patch tokenization, concatenates embedding tokens, and applies multihead attention with residual connections.
  • Experiments on four systolic murmur types from the CirCor DigiScope dataset show strong performance, reaching 95.96% classification accuracy.

Abstract

Systolic murmurs are extra heart sounds that occur during the contraction phase of the cardiac cycle, often indicating heart abnormalities caused by turbulent blood flow. Their intensity, pitch, and quality vary, requiring precise identification for the accurate diagnosis of cardiac disorders. This study presents an automatic classification system for systolic murmurs using a feature extraction module, followed by a classification model. The feature extraction module employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant dictionary composed of multiresolution complex Gabor basis functions (GBFs). The resulting projection weights are split and reshaped into variable-resolution time--frequency feature matrices. Processing multiple segments of a single recording using a shared dictionary mitigates murmur variability. This is achieved by learning the weights for each segment while enforcing that they correspond to the same set of basis functions in the dictionary, promoting consistent time--frequency feature matrices. The classification model is built based on a vision transformer to process multiple input matrices of different resolutions by passing each through a convolutional neural network for patch tokenization. All embedding tokens are then concatenated to form a matrix and forwarded to an encoder layer that includes multihead attention, residual connections, and a convolutional network with a kernel size of one. This integration of multiresolution feature extraction with transformer-based feature classification enhances the accuracy and reliability of heart murmur identification. An experimental analysis of four types of systolic murmurs from the CirCor DigiScope dataset demonstrates the effectiveness of the system, achieving a classification accuracy of 95.96\%.