OWL: an optimized and independently validated machine learning prediction model for lung cancer screening based on the UK Biobank, PLCO, and NLST populations

OWL:一种基于英国生物银行、PLCO 和 NLST 人群的优化且经过独立验证的肺癌筛查机器学习预测模型

阅读:1

Abstract

BACKGROUND: A reliable risk prediction model is critically important for identifying individuals with high risk of developing lung cancer as candidates for low-dose chest computed tomography (LDCT) screening. Leveraging a cutting-edge machine learning technique that accommodates a wide list of questionnaire-based predictors, we sought to optimize and validate a lung cancer prediction model. METHODS: We developed an Optimized early Warning model for Lung cancer risk (OWL) using the XGBoost algorithm with 323,344 participants from the England area in UK Biobank (training set), and independently validated it with 93,227 participants from UKB Scotland and Wales area (validation set 1), as well as 70,605 and 66,231 participants in the Prostate, Lung, Colorectal, and Ovarian cancer screening trial (PLCO) control and intervention subpopulations, respectively (validation sets 2 & 3) and 23,138 and 18,669 participants in the United States National Lung Screening Trial (NLST) control and intervention subpopulations, respectively (validation sets 4 & 5). By comparing with three competitive prediction models, i.e., PLCO modified 2012 (PLCO(m2012)), PLCO modified 2014 (PLCO(all2014)), and the Liverpool Lung cancer Project risk model version 3 (LLPv3), we assessed the discrimination of OWL by the area under receiver operating characteristic curve (AUC) at the designed time point. We further evaluated the calibration using relative improvement in the ratio of expected to observed lung cancer cases (RI(EO)), and illustrated the clinical utility by the decision curve analysis. FINDINGS: For general population, with validation set 1, OWL (AUC = 0.855, 95% CI: 0.829-0.880) presented a better discriminative capability than PLCO(all2014) (AUC = 0.821, 95% CI: 0.794-0.848) (p < 0.001); with validation sets 2 & 3, AUC of OWL was comparable to PLCO(all2014) (AUC(PLCOall2014)-AUC(OWL) < 1%). For ever-smokers, OWL outperformed PLCO(m2012) and PLCO(all2014) among ever-smokers in validation set 1 (AUC(OWL) = 0.842, 95% CI: 0.814-0.871; AUC(PLCOm2012) = 0.792, 95% CI: 0.760-0.823; AUC(PLCOall2014) = 0.791, 95% CI: 0.760-0.822, all p < 0.001). OWL remained comparable to PLCO(m2012) and PLCO(all2014) in discrimination (AUC difference from -0.014 to 0.008) among the ever-smokers in validation sets 2 to 5. In all the validation sets, OWL outperformed LLPv3 among the general population and the ever-smokers. Of note, OWL showed significantly better calibration than PLCO(m2012), PLCO(all2014) (RI(EO) from 43.1% to 92.3%, all p < 0.001), and LLPv3 (RI(EO) from 41.4% to 98.7%, all p < 0.001) in most cases. For clinical utility, OWL exhibited significant improvement in average net benefits (NB) over PLCO(all2014) in validation set 1 (NB improvement: 32, p < 0.001); among ever smokers of validation set 1, OWL (average NB = 289) retained significant improvement over PLCO(m2012) (average NB = 213) (p < 0.001). OWL had equivalent NBs with PLCO(m2012) and PLCO(all2014) in PLCO and NLST populations, while outperforming LLPv3 in the three populations. INTERPRETATION: OWL, with a high degree of predictive accuracy and robustness, is a general framework with scientific justifications and clinical utility that can aid in screening individuals with high risks of lung cancer. FUNDING: National Natural Science Foundation of China, the US NIH.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。