Abstract
INTRODUCTION: We tested whether spontaneous speech acoustics provide a scalable digital marker of biologically defined Alzheimer's disease (AD) risk. METHODS: Forty-nine cognitively unimpaired older adults were stratified within APOE genotype into Low-, Moderate-, and High-Risk groups based on log₁₀-transformed plasma p-tau217. Acoustic features were extracted from spontaneous speech and entered into multiclass SVM classifiers with leave-one-out cross-validation, with and without genetic-algorithm feature selection and age. Parallel models using neuropsychological measures were evaluated for comparison. Feature contributions were interpreted using SHAP. RESULTS: Speech-based models substantially outperformed cognition-only models and exceeded chance performance for three-group classification (33.3%), achieving up to 77% accuracy compared with 47% for neuropsychological models. SHAP analyses identified a compact, stage-dependent acoustic signature dominated by voice-quality, spectral-envelope, and formant-bandwidth features, with age contributing secondary effects. DISCUSSION: Spontaneous speech acoustics capture p-tau217/APOE-defined AD risk despite preserved cognition, supporting speech as a scalable, biologically grounded biomarker for preclinical AD risk stratification.