Development and validation of a machine learning model for predicting 3-year overall survival in metastatic nasopharyngeal carcinoma: a SEER database and web visualization study

开发和验证用于预测转移性鼻咽癌3年总生存率的机器学习模型:基于SEER数据库和网络可视化的研究

阅读:1

Abstract

BACKGROUND: Current prognostic models for metastatic nasopharyngeal carcinoma (M1-NPC) often employ oversimplified "catch-all" classifications that fail to account for the substantial heterogeneity in metastatic patterns and treatment responses. Furthermore, existing tools lack interactive visualization capabilities to support clinical decision-making. In this study, we aim to develop and validate a visual prognostic model for 3-year overall survival (OS) in M1-NPC patients. METHODS: We retrospectively analyzed clinical and pathological data from M1-NPC patients in the Surveillance, Epidemiology, and End Results (SEER) database diagnosed between 2010 and 2021 with complete follow-up. Exclusion criteria included missing surgical resection data or undocumented metastatic sites. Patients were randomly allocated to training (70%) and testing (30%) cohorts. Using univariate Cox regression, we identified significant prognostic variables among 19 clinical factors. Five machine learning (ML) algorithms-support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), and k-nearest neighbor (KNN)-were developed and evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, and accuracy metrics. The top 2 algorithms were combined to form an ensemble model. All six models were tested, with SHapley Additive exPlanations (SHAP) analysis applied for interpretability. The optimal model was implemented in an online calculator for 3-year OS prediction. RESULTS: Among 19 candidate variables, univariate Cox regression identified 11 significant prognostic factors (P<0.05) including age, race, time interval from diagnosis to treatment initiation, chemotherapy, radiotherapy, lymph node size, liver metastasis, lung metastasis, number of organs involved in metastasis (bone, liver, brain, lung), tumor stage (T stage), and node stage (N stage). In the training cohort (n=482), all five ML algorithms demonstrated excellent performance (AUC =0.9839-0.9998), with RF and KNN achieving near-perfect discrimination (AUC =0.9998). The subsequent ensemble model combining RF and KNN maintained high accuracy (AUC =0.998). In the test cohort (n=207), the RF model showed the best predictive performance [AUC =0.72, accuracy =0.94, sensitivity =0.19, specificity =0.99, positive predictive value (PPV) =0.60, negative predictive value (NPV) =0.94], followed by GBDT/KNN (AUC =0.67). Based on these results, we selected the RF model to develop an online calculator for 3-year OS prediction in M1-NPC patients. CONCLUSIONS: Our validated RF-based model addresses a critical gap in M1-NPC prognostication, offering clinicians an interpretable tool for survival prediction. While limited by database constraints, this represents the first SEER-derived online calculator for M1-NPC with immediate clinical applicability.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。