Optical Music Recognition is one of the fields where synthetic data is effectively utilized for training deep learning recognition models. Due to the lack of manually annotated data, the training data is generated by an automatic procedure which produces real-looking images of music scores in large quantities.
Mashcima, a system for synthesizing training data for handwritten music recognition, generates complete music scores but the individual symbols are not synthetic, they are sampled from real symbol datasets. In this paper, we explore the impact of utilizing an adversarial autoencoder within the symbol synthesis pipeline.
We show that in some cases the use of an autoencoder may not only be motivated by the creation of latent-space symbol embeddings but also by improved recognition accuracy.