Diagnostic classification of mild cognitive impairment in Parkinson's disease using subject-level stratified machine-learning analysis

利用基于个体分层机器学习分析的帕金森病轻度认知障碍诊断分类

阅读:2

Abstract

BACKGROUND: The timely identification of mild cognitive impairment (MCI) in Parkinson's disease (PD) is essential for early intervention and clinical management, yet it remains a challenge in practice. METHODS: We conducted an analysis of 3,154 clinical visits from 896 participants in the Parkinson's Progression Markers Initiative (PPMI) cohort. Participants were divided into two groups: cognitively normal (PD-NC, MoCA ≥ 26) and MCI (PD-MCI, 21 ≤ MoCA ≤ 25). To ensure no visit-level information leakage, subject-level stratified sampling was employed to split the data into training (70%) and hold-out test (30%) sets. From an initial set of 12 routinely assessed clinical features, seven were selected using least absolute shrinkage and selection operator (LASSO) logistic regression: age, sex, years of education, disease duration, UPDRS-I, UPDRS-III, and Geriatric Depression Scale (GDS). Four machine learning models-logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)-were trained using subject-level stratified 10-fold cross-validation with Bayesian optimization. Probabilistic outputs were dichotomized using three thresholding strategies: default 0.5, F1-score maximization, and Youden index maximization. RESULTS: On the independent test set, SVM achieved the highest overall performance with AUC-ROC of 0.7252 and AUC-PR of 0.5008. LR also performed competitively despite its simplicity. RF achieved the top performance in sensitivity, reaching 0.8150. Feature importance analysis consistently highlighted age, years of education, and disease duration as the most informative predictors for distinguishing PD-MCI. Additionally, more stringent site-level split validation yielded slightly decreased overall performance, with LR showing improved AUC-PR. Importantly, the core feature importance ranking remained largely consistent across validation strategies. CONCLUSION: This study developed and validated robust machine learning models for PD-MCI classification using standard clinical assessments alone. Through subject-level or site-level stratified cross-validation combined with Bayesian optimization, we achieved rigorous model evaluation while minimizing overfitting risk. These findings demonstrate the potential for implementing data-driven, interpretable diagnostic tools to enhance early cognitive impairment screening in routine PD care.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。