Abstract
Cross-site quantitative MRI (qMRI) studies are hindered by site-related variability, particularly when integrating cohorts with varying acquisition protocols and biological variables, such as age. Such batch effects can obscure accurate biological signals, thus affecting normative age modeling and clinical interpretations of pathology. This study aimed to (i) compare the harmonization performance of multiple strategies across multi-site cohorts with matched or unmatched qMRI protocols; (ii) quantify the impact of harmonization on normative age modeling across cohorts with comparable or partially overlapping age ranges; and (iii) examine whether clinically relevant deviations are preserved. Quantitative MRI data from three healthy control (HC) cohorts (n = 530) and one cohort of people with multiple sclerosis (pwMS) (n = 98) were included. Batch effects in the raw data were assessed using intra-class correlation coefficients (ICC) and analysis of variance (ANOVA), before and after adjustment for age and sex. Data harmonization was performed separately using Empirical Bayes-based generalized additive models (GAM) and Hierarchical Bayesian Regression (HBR) with B-spline fitting. Regional age-related effects across cortical grey matter (cGM), superficial white matter (sWM), and white matter (WM) bundles were modeled using second-degree polynomial regression, with the turning point defined as the peak age when the fitted curve reaches its maximum or minimum, and further used to assess potential shifts in age-related models between raw and harmonized datasets. For clinical validation, group-level differences in qMRI metrics between pwMS and controls were evaluated using Cohen's d. Regional Z-scores derived from the HBR framework were then used to assess associations with clinical disability, as measured by the Expanded Disability Status Scale (EDSS). Both harmonization methods reduced site-related variance, achieving post-harmonization ICC (<0.001) and η(2) values (<0.01) across all regions. For R(1), harmonization resulted in small, tissue-dependent differences from raw data in estimated regional peak ages (mean difference, 1.1 years; maximum, 4.0 years in WM bundles for HBR) and reduced RMSE across tissues, with the greatest reduction observed in cGM (approximately 12%). For R(2)*, despite protocol discrepancies, the harmonized and normalized raw data yielded comparable aging patterns (peak-age difference <1 year) and reduced RMSE, most notably in cGM (approximately 15-17%). Harmonized Z-scores preserved disease-related deviations, with pwMS exhibiting a progressive increase in the extent of significant regional differences from cGM (most regions) to sWM (nearly all regions) and WM bundles (whole regions). Furthermore, R(2)* Z-scores in specific cGM regions (Brodmann areas 1, 9, 10, 37, 43, and 46) showed positive correlations with EDSS scores in pwMS. This study suggests that both GAM and B-spline models effectively reduce site effects and maintain consistency in normative age modeling across multi-site qMRI datasets. HBR-based Z-scores were correlated with clinical disability and preserved biologically meaningful deviations in pwMS, supporting their application in differentiating pathological aging patterns. These findings highlight the potential value of cross-site qMRI harmonization for both normative age modeling and clinical applications in disease-related contexts.