SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SongBench, a specialized benchmark framework to evaluate text-to-song outputs with professional-level, fine-grained detail across seven aesthetic dimensions.
  • SongBench covers Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality, aiming to capture multi-dimensional nuances that existing benchmarks miss.
  • The authors built an expert-annotated dataset of 11,717 samples produced by state-of-the-art text-to-song models, with labels provided by music professionals.
  • Experimental results show SongBench correlates strongly with expert ratings, indicating it can serve as a reliable diagnostic tool.
  • The benchmark highlights specific weaknesses in current state-of-the-art systems, helping guide future model and system development toward more coherent and professional song generation.

Abstract

Recent advancements in Text-to-Song generation have enabled realistic musical content production, yet existing evaluation benchmarks lack the professional granularity to capture multi-dimensional aesthetic nuances. In this paper, we propose SongBench, a specialized framework for fine-grained song assessment across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. Utilizing this framework, we construct an expert-annotated database comprising 11,717 samples from state-of-the-art models, labeled by music professionals. Extensive experimental results demonstrate that SongBench achieves high correlation with expert ratings. By revealing fine-grained performance gaps in current state-of-the-art models, SongBench serves as a diagnostic benchmark to steer the development toward more professional and musically coherent song generation.