Abstract
This study aims to develop effective screening tools for cognitive impairment by integrating optimised speech classification features with various machine learning models. A total of 65 patients diagnosed with early-stage Mild Cognitive Impairment (MCI) and 55 healthy controls (HCs) were included. Audio data were collected through a picture description task and processed using the Python-based Librosa library for speech feature extraction. Three machine learning models were constructed: the Random Forest (RF) and Support Vector Machine (SVM) models utilised speech classification features optimised via the Sequential Forward Selection (SFS) algorithm, while the Extreme Gradient Boosting (XGBoost) model was trained on preprocessed speech data. After parameter tuning, the Librosa library successfully extracted 41 speech classification features from all participants. The application of the SFS optimisation strategy and the use of preprocessed data significantly improved identification accuracy. The SVM model achieved an accuracy of 0.825 (AUC: 0.91), the RF model reached 0.88 (AUC: 0.86), and the XGBoost model attained 0.92 (AUC: 0.91). These results suggest that speech-based machine learning models markedly improve the accuracy of distinguishing MCI patients from healthy older adults, providing reliable support for early cognitive deficit identification.