Abstract
BACKGROUND: Predicting the tumor spread through air spaces (STAS) pattern in patients with lung adenocarcinoma (LUAD) is important for timely preventive intervention, selection of appropriate treatment, and improvement in their quality of life. Therefore, it is critical to develop a machine learning (ML) model that can evaluate STAS in LUAD based on longitudinal study data. This study aims to develop and validate a computed tomography (CT) based deep learning model for preoperative prediction of STAS in lung adenocarcinoma. METHODS: In total, 689 patients diagnosed with LUAD after computed tomography (CT) and surgery at four hospitals from January 2019 to December 2023 were included and divided into the training, internal validation, external validation I, and external validation II cohorts. A deep learning (DL) radiomics score (radscore) was developed based on the ResNet-101 framework using CT images of LUAD. Seven ML algorithms [logistic regression (LR), extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF), decision tree (DT), k-nearest neighbor (KNN), and artificial neural networks (ANN)] were used to distinguish between STAS and non-STAS. SHapley Additive exPlanations (SHAP) was used for individualized and visual interpretations. SHAP addressed the cognitive opacity of the ML models. RESULTS: When comparing seven different ML models in the training cohort, the optimal combined model showed the best performance on various evaluation measures, with area under the receiver operating characteristic (ROC) curve (AUC) values of 0.906 [95% confidence interval (CI): 0.873-0.936] and 0.903 (95% CI: 0.855-0.944) in the training cohort and internal validation cohort, respectively. Based on the importance of the characteristics identified by the model interpretation method (SHAP), the important characteristics were the DL-score, lobulation, vacuole sign, and microvascular sign. In addition, SHAP summary diagrams were used to illustrate the positive and negative effects of the features influenced by the combined model. The SHAP dependency diagrams explain how individual features affect the output of a predictive model. Satisfactory generalization performance was shown with AUCs of 0.841 (95% CI: 0.757-0.923) and 0.882 (95% CI: 0.806-0.953) in the two external validation cohorts, respectively. CONCLUSIONS: A combined model based on the DL-score and clinically independent risk factors can accurately evaluate disease STAS in patients with LUAD. Additionally, an interpretable framework can increase the transparency of the model, provide clear explanations for personalized risk prediction, and offer a more intuitive understanding of the effects of the key features in the model.