Abstract
INTRODUCTION: Digital voice analysis is an emerging tool for differentiating cognitive states, but it poses privacy risks as automated systems may inadvertently identify speakers. METHODS: We developed a computational framework to evaluate the trade-off between voice obfuscation and cognitive assessment accuracy, using pitch-shifting as a representative method. This framework was applied to voice recordings from the Framingham Heart Study (FHS, n = 128) and the DementiaBank Delaware (DBD, n = 85) corpus, both featuring responses to neuropsychological tests. Speaker obfuscation was measured via equal error rate (EER), and diagnostic utility was assessed through machine learning models distinguishing cognitive states: normal cognition (NC), mild cognitive impairment (MCI), and dementia (DE). RESULTS: With the top 20 acoustic features, our framework achieved classification accuracies of 62.2% (EER: 0.3335) on the FHS dataset for NC, MCI, and DE differentiation, and 63.7% (EER: 0.1796) on the DBD dataset for NC and MCI differentiation, using obfuscated speech files. DISCUSSION: Our results demonstrate the feasibility of privacy-preserving voice markers, offering a scalable solution for voice-based cognitive assessments. HIGHLIGHTS: We developed a computational framework using pitch-shifting and acoustic transformations to balance speaker privacy and diagnostic utility in voice-based cognitive assessments. We evaluated the framework on two independent datasets, Framingham Heart Study (FHS, n = 128) and DementiaBank Delaware (DBD, n = 85) corpus, assessing the trade-off between privacy (measured by equal error rate [EER]) and classification accuracy. Our framework achieved classification accuracies of 62.2% (EER: 0.3335) for distinguishing normal cognition (NC), mild cognitive impairment (MCI), and dementia in the FHS dataset and 63.7% (EER: 0.1796) for NC and MCI differentiation in the DBD dataset, using obfuscated speech files. Our framework demonstrates that pitch-shifting levels can preserve diagnostic utility while protecting speaker identity, offering a scalable and privacy-preserving solution.