Predicting the risk of mental disorders using complete blood count indicators: a machine learning approach

利用全血细胞计数指标预测精神障碍风险:一种机器学习方法

阅读:4

Abstract

BACKGROUND: This study aims to explore the use of readily available complete blood count (CBC) indicators, combined with machine learning algorithms, to build a predictive model for mental disorders. METHODS: This study recruited 1,379 university volunteers in September 2024, collecting data on age, gender, and 22 CBC variables. The dependent variable was a binary outcome assessed by the university's mental health evaluation system based on the SCL-90 scale, consisting of a positive group with mental disorders and a negative group without mental disorders. SMOTETomek hybrid sampling was applied to resolve data imbalance. Random Forest (RF) was used for feature selection. This study then constructed and compared four machine learning models: eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT). Model performance was evaluated using AUC, F1-score, accuracy, sensitivity, and specificity. The Shapley Additive exPlanations (SHAP) method was employed to interpret the optimal model. Furthermore, a logistic regression (LR) algorithm was used to build a nomogram. RESULTS: Among the 1,379 volunteers, 1,023 tested negative and 356 tested positive. Fifteen volunteers had missing data for four indicators. Feature selection based on the random forest method identified 14 optimal variables for model construction. Among the six machine learning algorithms tested, XGBoost demonstrated the best performance with the highest AUC, reaching 0.860 on the training set and 0.827 on the testing set. A SHAP analysis of the XGBoost model and the nomogram results both confirmed that the top three contributing features were Basophil Percentage (BASO%), Basophil Count (BASO#), and Mean Corpuscular Hemoglobin (MCH). CONCLUSION: This study successfully developed a mental disorders prediction model based on the XGBoost algorithm and complete blood count data, providing clinicians with objective risk assessment indicators to assist in diagnosis and improve both efficiency and accuracy.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。