Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

arXiv cs.LG / 5/1/2026

📰 NewsModels & Research

Key Points

  • The paper addresses a key limitation in computational bioacoustics: many systems inherit 16 kHz, 0–8 kHz baseband audio pretraining and therefore discard ultrasonic-range information present in animal recordings.
  • It proposes an adaptive multi-band encoding framework that splits the full call spectrum into frequency-band features and fuses them into a single representation for classification.
  • Experiments using eight pre-trained models across three bioacoustic datasets (with five fusion strategies) show that fused representations typically outperform baseband and time-expansion baselines on two datasets.
  • The authors’ analyses indicate that some encoders generate decorrelated band embeddings, which helps class separation after the fusion step.

Abstract

Animals hear and vocalize across frequency ranges that differ substantially from humans, often extending into the ultrasonic domain. Yet most computational bioacoustics systems rely on audio models pre-trained at 16 kHz, restricting their usable bandwidth to the 0-8 kHz baseband and discarding higher-frequency information present in many bioacoustic recordings. We investigate a multi-band encoding framework that decomposes the full spectrum of animal calls into band features and fuses them into a unified representation. Similarity analyses on models show that certain encoders produce decorrelated band embeddings that improve class separation after fusion. Classification experiments on three bioacoustic datasets using eight pre-trained models and five fusion strategies show that fused representations consistently outperform the baseband and time-expansion baselines on two datasets, showing the potential of multi-band methods for full-spectrum encoding of animal calls.