Abstract
Quantification using the Centiloid (CL) scale has become a valuable information to consider when interpreting amyloid-PET images and is now implemented in several software packages. This work aims to assess the comparability of CL from [(18)F]flutemetamol scans derived using several research and commercial quantification pipelines. METHODS: This analysis relies on three datasets: a test-retest cohort, a group of clinically relevant patients with amnestic mild cognitive impairment (aMCI) and a subgroup from the BioFINDER-1 cohort enriched with scans with amyloid loads around potential clinical decision thresholds (0-50CL). Images from the Test-Retest and aMCI cohorts were processed across seven quantification pipelines: three commercial software platforms and four research tools, including the standard SPM8 workflow. The statistical analysis was based on three steps: 1) a repeatability analysis using the test-retest data; 2) a reproducibility analysis across all pipelines using the aMCI cohort; 3) an inter-software reliability analysis around three clinically relevant thresholds: 11, 25 and 37 CL using the aMCI and the BioFINDER-1 data. RESULTS: In the Test-Retest dataset composed of 10 Alzheimer's Disease (AD) patients, high test-retest repeatability and reliability were observed with an absolute bias of less than 5 CL. Within-individual coefficients of variation ranged from 2.6 to 4.4% and repeatability coefficients from ∼8 to ∼16 CL. CL quantification was generally reproducible across pipelines in a dataset of 80 aMCI individuals (R(2) in [0.94-0.99], slope in [0.98-1.03], intercept in [-4, 4], but the 95% limits of agreement (LoAs) ranged between ∼±12 and ∼±21 CL. Agreement between software around the three clinically relevant thresholds was 92-100% (kappa 0.83-1) in the aMCI data (N = 80) and 75-99% (kappa 0.48-0.96) in the BioFINDER-1 subgroup (N = 110). CONCLUSION: In this study, CL quantification was shown to be robust across a range of currently available software platforms. Uncertainty estimates should always be considered when interpreting results. In clinical practice, the choice of quantification software should not impact patient management decisions.