Abstract
BACKGROUND: The diagnosis of occupational pneumoconiosis requires more accurate predictive models. The purpose of this study is to screen blood markers associated with early pneumoconiosis development from blood routine indicators in physical examination data, and to develop a highly sensitive and accurate clinical prediction model using machine learning (ML) algorithms to promote early diagnosis and timely intervention. METHOD: Data on age and various blood test results were collected from the results of the physical examination. Predictors were analyzed using the Least Absolute Contraction and Choice Operator (LASSO) and multiple logistic regression. A total of 9 ML models were evaluated in this study, including Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest (RF), Adaptive Boosting (AdBoost), Gaussian Naïve Bayes (GNB), Multilayer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). We compared the performance of the models based on the following criteria: ROC, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1, the decision curve analysis (DCA), calibration curves, and precision-recall (PR) curves of the 9 models. Shapley Additive exPlanations (SHAP) interpretations are developed for personalized risk assessment. RESULTS: In this study, 6 risk variables associated with the development of pneumoconiosis were identified, including White Blood Cell (WBC), Platelet Distribution Width (PDW), Total Bilirubin (TB), Absolute Neutrophil Count (ANC), Alanine Aminotransferase (ALT) and Aspartate Aminotransferase (AST). SVM was considered the optimal model and showed a good clinical applicability evaluation. SHAP analysis was employed to define the contributions of 6 variables to the progression of pneumoconiosis. CONCLUSION: The indicators ultimately established as being associated with pneumoconiosis progression were WBC, PDW, TB, ANC, ALT and AST. The ML algorithm combined blood biochemical indicators to determine the risk factors associated with the occurrence of pneumoconiosis. The SVM model performs well and has the potential to improve early detection and diagnosis in clinical practice.