Multiple sclerosis is a leading cause of neurological disability in adults. Heterogeneity in multiple sclerosis clinical presentation has posed a major challenge for identifying genetic variants associated with disease outcomes.
To overcome this challenge, we used prospectively ascertained clinical outcomes data from the largest international multiple sclerosis registry, MSBase. We assembled a cohort of deeply phenotyped individuals of European ancestry with relapse-onset multiple sclerosis.
We used unbiased genome-wide association study and machine learning approaches to assess the genetic contribution to longitudinally defined multiple sclerosis severity phenotypes in 1813 individuals. Our primary analyses did not identify any genetic variants of moderate to large effect sizes that met genome-wide significance thresholds.
The strongest signal was associated with rs7289446 (beta = -0.4882, P = 2.73 x 10(-7)), intronic to SEZ6L on chromosome 22. However, we demonstrate that clinical outcomes in relapse-onset multiple sclerosis are associated with multiple genetic loci of small effect sizes.
Using a machine learning approach incorporating over 62 000 variants together with clinical and demographic variables available at multiple sclerosis disease onset, we could predict severity with an area under the receiver operator curve of 0.84 (95% CI 0.79-0.88). Our machine learning algorithm achieved positive predictive value for outcome assignation of 80% and negative predictive value of 88%.
This outperformed our machine learning algorithm that contained clinical and demographic variables alone (area under the receiver operator curve 0.54, 95% CI 0.48-0.60). Secondary, sex-stratified analyses identified two genetic loci that met genome-wide significance thresholds.
One in females (rs10967273; beta(female) = 0.8289, P = 3.52 x 10(-8)), the other in males (rs698805; beta(male) = -1.5395, P = 4.35 x 10(-8)), providing some evidence for sex dimorphism in multiple sclerosis severity. Tissue enrichment and pathway analyses identified an overrepresentation of genes expressed in CNS compartments generally, and specifically in the cerebellum (P = 0.023).
These involved mitochondrial function, synaptic plasticity, oligodendroglial biology, cellular senescence, calcium and G-protein receptor signalling pathways. We further identified six variants with strong evidence for regulating clinical outcomes, the strongest signal again intronic to SEZ6L (adjusted hazard ratio 0.72, P = 4.85 x 10(-4)).
Here we report a milestone in our progress towards understanding the clinical heterogeneity of multiple sclerosis outcomes, implicating functionally distinct mechanisms to multiple sclerosis risk. Importantly, we demonstrate that machine learning using common single nucleotide variant clusters, together with clinical variables readily available at diagnosis can improve prognostic capabilities at diagnosis, and with further validation has the potential to translate to meaningful clinical practice change.
Jokubaitis et al. report a milestone in understanding the heterogeneity of multiple sclerosis clinical outcomes, implicating functionally distinct mechanisms in risk. Using a machine learning approach, they demonstrate the potential of common genetic variants to serve as prognostic biomarkers when combined with demographic data.