DNA methylation classifiers ("episignatures") help to determine the pathogenicity of variants of uncertain significance (VUS). However, their sensitivity is limited due to their training on unambiguous cases with strong-effect variants so that the classification of variants with reduced effect size or in mosaic state may fail.
Moreover, episignature evaluation of mosaics as a function of their degree of mosaicism has not been developed so far. We improved episignatures with respect to three categories.
Applying (i) minimum-redundancy-maximum-relevance feature selection we reduced their length by up to one order of magnitude without loss of accuracy. Performing (ii) repeated re-training of a support vector machine classifier by step-wise inclusion of cases in the training set that reached probability scores larger than 0.5, we increased the sensitivity of the episignature-classifiers by 30%.
In the newly diagnosed patients we confirmed the association between DNA methylation aberration and age at onset of KMT2B-deficient dystonia. Moreover, we found evidence for allelic series, including KMT2B-variants with moderate effects and comparatively mild phenotypes such as late-onset focal dystonia.
Retrained classifiers also can detect mosaics that previously remained below the 0.5-threshold, as we showed for KMT2D-associated Kabuki syndrome. Conversely, episignature-classifiers are able to revoke erroneous exome calls of mosaicism, as we demonstrated by (iii) comparing presumed mosaic cases with a distribution of artificial in silico-mosaics that represented all the possible variation in degree of mosaicism, variant read sampling and methylation analysis.