How Class Ontology and Data Scale Affect Audio Transfer Learning

arXiv cs.LG / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper conducts a rigorous study of audio-to-audio transfer learning by pre-training different model states on ontology-based subsets of AudioSet and then fine-tuning on three downstream computer audition tasks (acoustic scene, bird activity, and speech command recognition).
It finds that scaling pre-training data—both by increasing the number of samples and by expanding the number of classes—improves transfer learning performance.
The study reports that this scaling benefit is often outweighed by how similar the pre-training data is to the downstream task, which can cause the model to learn sufficiently comparable features.
The work frames transfer learning as still having open mechanistic questions and aims to clarify when and why it works in the audio domain specifically.

Abstract

Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by similarity between pre-training and the downstream task, which can lead the model to learn comparable features.

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

Dev.to

The Redline Economy

Dev.to

$500 GPU outperforms Claude Sonnet on coding benchmarks

Dev.to

From Scattershot to Sniper: AI for Hyper-Personalized Media Lists

Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure

Dev.to

How Class Ontology and Data Scale Affect Audio Transfer Learning

Key Points

Abstract

Related Articles

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

The Redline Economy

$500 GPU outperforms Claude Sonnet on coding benchmarks

From Scattershot to Sniper: AI for Hyper-Personalized Media Lists

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer