Charles Explorer logo
🇬🇧

The MUSCIMA++ Dataset for Handwritten Optical Music Recognition

Publication at Faculty of Mathematics and Physics |
2017

Abstract

Optical Music Recognition (OMR) promises to make accessible the content of large amounts of musical documents, an important component of cultural heritage. However, the field does not have an adequate dataset and ground truth for benchmarking OMR systems, which has been a major obstacle to measurable progress.

Furthermore, machine learn- ing methods for OMR require training data. We design and collect MUSCIMA++, a new dataset for OMR.

Ground truth in MUSCIMA++ is a notation graph, which our analysis shows to be a necessary and sufficient representation of music notation. Building on the CVC-MUSCIMA dataset for staffline removal, the MUSCIMA++ dataset v1.0 consists of 140 pages of hand- written music, with 91254 manually annotated notation symbols and 82247 explicitly marked relationships between symbol pairs.

The dataset allows training and directly evaluating models for symbol classification, symbol localization, and notation graph assembly, and musical content extraction, both in isolation and joint