Purpose: Neuroimaging pipelines have long been known to generate mildly differing results depending on various factors, including software version. While considered generally acceptable and within the margin of reasonable error, little is known about their effect in common research scenarios such as inter-group comparisons between healthy controls and various pathological conditions.
The aim of the presented study was to explore the differences in the inferences and statistical significances in a model situation comparing volumetric parameters between healthy controls and type 1 diabetes patients using various FreeSurfer versions. Methods: T1- and T2-weighted structural scans of healthy controls and type 1 diabetes patients were processed with FreeSurfer 5.3, FreeSurfer 5.3 HCP, FreeSurfer 6.0 and FreeSurfer 7.1, followed by inter-group statistical comparison using outputs of individual FreeSurfer versions.
Results: Worryingly, FreeSurfer 5.3 detected both cortical and subcortical volume differences out of the preselected regions of interest, but newer versions such as FreeSurfer 5.3 HCP and FreeSurfer 6.0 reported only subcortical differences of lower magnitude and FreeSurfer 7.1 failed to find any statistically significant inter-group differences. Conclusion: Since group averages of individual FreeSurfer versions closely matched, in keeping with previous literature, the main origin of this disparity seemed to lie in substantially higher within-group variability in the model pathological condition.
Ergo, until validation in common research scenarios as case–control comparison studies is included into the development process of new software suites, confirmatory analyses utilising a similar software based on analogous, but not fully equivalent principles, might be considered as supplement to careful quality control.