Abstract
INTRODUCTION: Parkinson's disease (PD)-related cognitive impairment (PD-CI) is a common and impactful complication of PD, yet current predictive models often rely on specialized resources, lack interpretability, or have limited cross-population validation. This study aimed to develop an interpretable machine learning framework for PD-CI detection using only routine clinical data, addressing unmet needs in accessible and generalizable PD care. METHODS: We analyzed 1,279 participants from the Parkinson's Progression Markers Initiative (PPMI) as the discovery cohort and 197 patients from an independent validation cohort. PD-CI was defined by a Montreal Cognitive Assessment (MoCA) score ≤26 and Unified Parkinson's Disease Rating Scale Part I (UPDRS-I) score ≥1. Twenty-one clinical features-encompassing hematological parameters, metabolic markers, and demographics-were preprocessed with synthetic minority over-sampling. Four machine learning models were trained and optimized via nested 5-fold cross-validation. RESULTS: The Random Forest algorithm achieved superior performance in the discovery cohort (AUC = 0.83), outperforming CatBoost (AUC = 0.82), XGBoost (AUC = 0.79), and neural networks (AUC = 0.66). External validation of the framework preserved 71.57% accuracy. SHAP interpretability analysis identified age, neutrophil-to-lymphocyte ratio (NLR), and serum uric acid as critical predictors, revealing synergistic risk effects between elevated inflammation markers and reduced antioxidant levels. DISCUSSION: This framework demonstrates diagnostic accuracy comparable to advanced neuroimaging while utilizing readily available clinical data, enhancing accessibility in resource-limited settings. It highlights neuroinflammation and oxidative stress as key mechanistic drivers of PD-CI, advancing pathophysiological understanding. Multicenter validation confirms the model's robustness across ethnic populations, supporting its utility as a clinically actionable tool for PD-CI screening and monitoring.