Prediction of STAS in lung adenocarcinoma with nodules ≤ 2 cm using machine learning: a multicenter retrospective study

利用机器学习预测肺腺癌结节≤2 cm患者的STAS：一项多中心回顾性研究

阅读：1

作者：Zhang,Zhan,Zhao,Yue,Ma,Yi-Jun,Chen,Chuan-Qi,Li,Zhen-Yi,Wang,Yv-Kai,Zhang,Si-Jie,Li,Hai-Ming,Li,Yongmeng,Tian,Yu,Tian,Hui

期刊：	BMC Cancer	影响因子：	3.400
时间：	2025	起止号：	2025 Mar 7;25(1):417
doi：	10.1186/s12885-025-13783-z	研究方向：	肿瘤

Abstract

BACKGROUND AND OBJECTIVE: Spread through air spaces (STAS) is an important factor in determining the aggressiveness and recurrence risk of lung cancer, especially in early-stage adenocarcinoma. Preoperative identification of STAS is crucial for optimizing surgical strategies. This study aimed to develop and validate machine learning models to predict the presence of STAS using preoperative clinical, radiological, and pathological data in lung cancer patients. PATIENTS AND METHODS: A retrospective analysis was conducted on 1,290 lung cancer patients from two hospitals: Qilu Hospital of Shandong University and Qianfoshan Hospital. Data from 1,174 patients from Qilu Hospital were used for model training and internal validation, while 116 patients from Qianfoshan Hospital were used for external validation. Thirteen key variables, identified using least absolute shrinkage and selection operator (LASSO) regression, were included in the construction of eight machine learning models: decision tree (DT), random forest (RF), regularized support vector machine (RSVM), logistic regression (LR), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and K-nearest neighbors (KNN). Model performance was evaluated using receiver operating characteristic (ROC) curves, area under the curve (AUC), calibration curves, decision curve analysis (DCA), and SHapley additive explanations (SHAP) plots. RESULTS: The XGBoost model achieved the best performance with an AUC of 0.931 (95% CI: 0.897-0.964) in the internal validation cohort and 0.904 (95% CI: 0.835-0.973) in the external validation cohort, outperforming other models. DCA demonstrated the clinical utility of XGBoost, LightGBM, and RF models, which provided superior net benefit across various threshold probabilities. SHAP analysis revealed that the most influential factors in predicting STAS were carcinoembryonic antigen (CEA), forced expiratory volume in one second (FEV1), consolidation-to-tumor ratio (CTR), maximal voluntary ventilation (MVV), and CT value. CONCLUSION: The XGBoost model demonstrated robust predictive performance for preoperative identification of STAS in lung cancer patients, showing high generalizability in external validation. These findings suggest that machine learning-based predictions could guide clinical decision-making and improve surgical outcomes by identifying high-risk patients for more aggressive treatment strategies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。