Music segmentation algorithms identify the structure of a music recording by automatically dividing it into sections and determining which sections repeat and when.
In this talk I give an overview of this music information retrieval problem and present a novel music segmentation method that leverages deep audio embeddings learned via other tasks.
This approach builds on an existing segmentation algorithm replacing manually engineered features with deep embeddings learned through audio classification problems where data are abundant. Additionally, I present a novel section fusion algorithm that leverages the segmentation with multiple hierarchical levels to consolidate short segments at each level in a way that is consistent with the segmentations at lower levels.
Through a series of experiments and audio examples I show that this method yields state-of-the-art results in most metrics and most popular publicly available datasets.