Abstract
People with Parkinson's disease (PD) frequently develop cognitive impairments, and early accurate classification of cognitive status is critically important for clinical intervention. In this study, we leveraged data from the Parkinson's Progression Markers Initiative (PPMI) to develop a two-stage machine-learning framework that distinguishes among three cognitive states: PD with normal cognition (PD-NC), PD with mild cognitive impairment (PD-MCI), and PD dementia (PDD). Our approach combined SHapley Additive exPlanations (SHAP) for model interpretability with an ensemble of XGBoost and multilayer perceptron (MLP) classifiers, addressing class imbalance via the SMOTE-Tomek method. All model development and validation were conducted with a strict hold-out evaluation, with the test-set entirely excluded from feature selection, model training, and threshold optimization. Independent validation demonstrated strong and balanced classification performance across all cognitive subgroups, with particularly effective identification of cognitively impaired individuals requiring clinical attention. The area under the receiver operating characteristic curve (AUC) for three-class discrimination exceeded 0.85. Key predictors, including Montreal Cognitive Assessment (MoCA) scores and activities of daily living assessments, were validated as clinically meaningful by SHAP analysis. The proposed two-stage explainable model demonstrates strong and balanced classification performance across cognitive subgroups in PD. Its ability to identify people at high risk for dementia highlights its potential utility in clinical workflows, particularly as a scalable tool for early cognitive stratification and decision support in routine neurology practice. However, external validation on diverse real-world cohorts is warranted before clinical implementation.