AI Navigate

EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

arXiv cs.AI / 3/11/2026

Models & Research

Key Points

  • EDMFormer is a transformer-based model designed for music structure segmentation specifically tailored for Electronic Dance Music (EDM), addressing weaknesses in existing models that rely on lyrical or harmonic similarity.
  • The model uses self-supervised audio embeddings trained on a new genre-specific dataset, EDM-98, which contains 98 professionally annotated EDM tracks reflecting the unique structural elements of EDM like buildup, drop, and breakdown.
  • EDMFormer significantly improves boundary detection and section labeling in EDM tracks, particularly for sections like drops and buildups, compared to previous approaches.
  • The approach demonstrates that combining learned representations with genre-specific data and structural priors enhances performance, suggesting potential applicability to other specialized music genres or broader audio analysis tasks.

Computer Science > Sound

arXiv:2603.08759 (cs)
[Submitted on 8 Mar 2026]

Title:EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation

View a PDF of the paper titled EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation, by Sahal Sajeer and 3 other authors
View PDF HTML (experimental)
Abstract:Music structure segmentation is a key task in audio analysis, but existing models perform poorly on Electronic Dance Music (EDM). This problem exists because most approaches rely on lyrical or harmonic similarity, which works well for pop music but not for EDM. EDM structure is instead defined by changes in energy, rhythm, and timbre, with different sections such as buildup, drop, and breakdown. We introduce EDMFormer, a transformer model that combines self-supervised audio embeddings using an EDM-specific dataset and taxonomy. We release this dataset as EDM-98: a group of 98 professionally annotated EDM tracks. EDMFormer improves boundary detection and section labelling compared to existing models, particularly for drops and buildups. The results suggest that combining learned representations with genre-specific data and structural priors is effective for EDM and could be applied to other specialized music genres or broader audio domains.
Comments:
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.08759 [cs.SD]
  (or arXiv:2603.08759v1 [cs.SD] for this version)
  https://doi.org/10.48550/arXiv.2603.08759
Focus to learn more
arXiv-issued DOI via DataCite

Submission history

From: Oscar Chung [view email]
[v1] Sun, 8 Mar 2026 15:56:37 UTC (522 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation, by Sahal Sajeer and 3 other authors
  • View PDF
  • HTML (experimental)
  • TeX Source
Current browse context:
cs.SD
< prev   |   next >
Change to browse by:

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
Links to Code Toggle
Papers with Code (What is Papers with Code?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.