EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

arXiv cs.AI / 3/11/2026

Models & Research

Read original →

共有:

Key Points

EDMFormer is a transformer-based model designed for music structure segmentation specifically tailored for Electronic Dance Music (EDM), addressing weaknesses in existing models that rely on lyrical or harmonic similarity.
The model uses self-supervised audio embeddings trained on a new genre-specific dataset, EDM-98, which contains 98 professionally annotated EDM tracks reflecting the unique structural elements of EDM like buildup, drop, and breakdown.
EDMFormer significantly improves boundary detection and section labeling in EDM tracks, particularly for sections like drops and buildups, compared to previous approaches.
The approach demonstrates that combining learned representations with genre-specific data and structural priors enhances performance, suggesting potential applicability to other specialized music genres or broader audio analysis tasks.

Computer Science > Sound

arXiv:2603.08759 (cs)

[Submitted on 8 Mar 2026]

Title:EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

Authors:Sahal Sajeer, Krish Patel, Oscar Chung, Joel Song Bae

View a PDF of the paper titled EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation, by Sahal Sajeer and 3 other authors

View PDF HTML (experimental)

Abstract:Music structure segmentation is a key task in audio analysis, but existing models perform poorly on Electronic Dance Music (EDM). This problem exists because most approaches rely on lyrical or harmonic similarity, which works well for pop music but not for EDM. EDM structure is instead defined by changes in energy, rhythm, and timbre, with different sections such as buildup, drop, and breakdown. We introduce EDMFormer, a transformer model that combines self-supervised audio embeddings using an EDM-specific dataset and taxonomy. We release this dataset as EDM-98: a group of 98 professionally annotated EDM tracks. EDMFormer improves boundary detection and section labelling compared to existing models, particularly for drops and buildups. The results suggest that combining learned representations with genre-specific data and structural priors is effective for EDM and could be applied to other specialized music genres or broader audio domains.

Comments:
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2603.08759 [cs.SD]
	(or arXiv:2603.08759v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2603.08759 Focus to learn more arXiv-issued DOI via DataCite

Submission history

From: Oscar Chung [view email]
[v1] Sun, 8 Mar 2026 15:56:37 UTC (522 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation, by Sahal Sajeer and 3 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2026-03

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

Links to Code Toggle

Papers with Code (What is Papers with Code?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

Reddit r/LocalLLaMA

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

Reddit r/LocalLLaMA

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!

Reddit r/LocalLLaMA

EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

Key Points

Computer Science > Sound

Title:EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Newest GPU server in the lab! 72gb ampere vram!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Computer Science > Sound

Title:EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**

Newest GPU server in the lab! 72gb ampere vram!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding