Early detection of nasopharyngeal carcinoma through machine-learning-driven prediction model in a population-based healthcare record database

利用基于人群医疗记录数据库的机器学习驱动预测模型早期检测鼻咽癌

阅读:1

Abstract

OBJECTIVE: Early diagnosis and treatment of nasopharyngeal carcinoma (NPC) are vital for a better prognosis. Still, because of obscure anatomical sites and insidious symptoms, nearly 80% of patients with NPC are diagnosed at a late stage. This study aimed to validate a machine learning (ML) model utilizing symptom-related diagnoses and procedures in medical records to predict nasopharyngeal carcinoma (NPC) occurrence and reduce the prediagnostic period. MATERIALS AND METHODS: Data from a population-based health insurance database (2001-2008) were analyzed, comparing adults with and without newly diagnosed NPC. Medical records from 90 to 360 days before diagnosis were examined. Five ML algorithms (Light Gradient Boosting Machine [LGB], eXtreme Gradient Boosting [XGB], Multivariate Adaptive Regression Splines [MARS], Random Forest [RF], and Logistics Regression [LG]) were evaluated for optimal early NPC detection. We further use a real-world data of 1 million individuals randomly selected for testing the final model. Model performance was assessed using AUROC. Shapley values identified significant contributing variables. RESULTS: LGB showed maximum predictive power using 14 features and 90 days before diagnosis. The LGB models achieved AUROC, specificity, and sensitivity were 0.83, 0.81, and 0.64 for the test dataset, respectively. The LGB-driven NPC predictive tool effectively differentiated patients into high-risk and low-risk groups (hazard ratio: 5.85; 95% CI: 4.75-7.21). The model-layering effect is valid. CONCLUSIONS: ML approaches using electronic medical records accurately predicted NPC occurrence. The risk prediction model serves as a low-cost digital screening tool, offering rapid medical decision support to shorten prediagnostic periods. Timely referral is crucial for high-risk patients identified by the model.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。