Machine learning-driven prognostic prediction model for composite small cell lung cancer: identifying risk factors with network tools and validation using SEER data and external cohorts

基于机器学习的复合型小细胞肺癌预后预测模型:利用网络工具识别风险因素,并使用SEER数据和外部队列进行验证

阅读:1

Abstract

BACKGROUND: Lung cancer continues to be the primary cause of cancer-related mortality globally, with combined small cell lung carcinoma (C-SCLC) constituting a relatively uncommon yet highly aggressive subset of this disease. Despite its clinical significance, limited efforts have been made to develop survival prediction models tailored to the clinical characteristics of C-SCLC patients. Additionally, the interpretability of existing models remains limited. METHODS: This study aimed to develop and validate an interpretable machine learning model for predicting survival outcomes in C-SCLC patients using clinical data from the SEER database and external validation with Chinese patient cohorts. Initially, we employed the Cox proportional hazards model for rigorous variable selection. Subsequently, through 10-fold cross-validation and grid search for optimal parameters, we selected the XGBoost model as the best-performing one among four candidates. Furthermore, we enhanced the model's interpretability by incorporating the SHapley Additive exPlanations (SHAP) method, which helped us understand the contribution of each variable within the model. RESULTS: We constructed a predictive model using data from 1,230 SEER patients and validated it externally with data from 154 Chinese patients. The XGBoost model demonstrated excellent performance in predicting survival outcomes at 1-year, 3-year, and 5-year. The AUC values for the external validation cohort were 0.849, 0.830, and 0.811, respectively. SHAP analysis revealed that N stage, T stage, radiotherapy, surgery, and gender are key factors influencing the ML model's predictions. To enhance clinical utility, we have developed an interpretable web-based tool to predict patients' 1-year survival probability. CONCLUSION: The XGBoost model, integrating demographic and clinical factors of C-SCLC patients, demonstrated excellent predictive performance. Our web-based prediction tool will promote the development of personalized treatment strategies and optimize clinical decision-making.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。