Abstract
Neurodegenerative diseases like Alzheimer's are difficult to diagnose due to brain complexity and imaging variability. However, volumetric analysis tools, using reference curves, help detect abnormal brain atrophy and support diagnosis and monitoring. This study evaluates the robustness of three segmentation algorithms, AssemblyNet, FastSurfer and FreeSurfer, in constructing brain volume reference curves and detecting hippocampal atrophy. Using data from 3,730 cognitively normal subjects, we built reference curves and assessed robustness to magnetic field strength (1.5T vs. 3T) using four error metrics (sMAPE, sMSPE, wMAPE, sMdAPE) with bootstrap validation. We evaluated classification performance using hippocampal atrophy rates and HAVAs scores (Hippocampal-Amygdalo-Ventricular Atrophy scores). AssemblyNet shows the lowest errors across all robustness metrics. In contrast, FastSurfer and FreeSurfer exhibit greater deviations, indicating higher sensitivity to field strength variability. AssemblyNet provides consistent hippocampal atrophy rates across all reference models, despite slightly lower sensitivity, while FastSurfer and FreeSurfer display greater variability. Specificity ranges from 0.87 to 0.91 for AssemblyNet, compared to 0.76-0.93 for FastSurfer and 0.86-0.93 for FreeSurfer. Using the HAVAs score, all methods detect high atrophy rates in Alzheimer's patients. FastSurfer achieves the highest sensitivity (0.98), while AssemblyNet reaches the best specificity (0.95) and the highest balanced accuracy (0.91). This study underscores the importance of algorithm choice for reliable brain volumetric analysis in heterogeneous imaging environments. Among the methods tested, AssemblyNet stands out as both sensitive to Alzheimer's-related atrophy and robust to acquisition variability, making it a strong candidate when analyzing hippocampal volumes in large, multi-site datasets.