INTRODUCTION: Biases in cancer incidence characteristics have led to significant imbalances in databases constructed by prospective cohort studies. Since they use imbalanced databases, many traditional algorithms for training cancer risk prediction models perform poorly. METHODS: To improve prediction performance, we introduced a Bagging ensemble framework to an absolute risk model based on ensemble penalized Cox regression (EPCR). We then tested whether the EPCR model outperformed other traditional regression models by varying the censoring rate of the simulated data. RESULTS: Six different simulation studies were performed with 100 replicates. To assess model performance, we calculated mean false discovery rate, false omission rate, true positive rate, true negative rate, and the areas under the receiver operating characteristic curve (AUC) values. We found that the EPCR procedure could reduce the false discovery rate (FDR) for important variables at the same true positive rate (TPR), thereby achieving more accurate variable screening. In addition, we used the EPCR procedure to build a breast cancer risk prediction model based on the Breast Cancer Cohort Study in Chinese Women database. AUCs for 3- and 5-year predictions were 0.691 and 0.642, representing improvements of 0.189 and 0.117 over the classical Gail model, respectively. DISCUSSION: We conclude that the EPCR procedure can overcome challenges posed by imbalanced data and improve the performance of cancer risk assessment tools.
An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk.
阅读:5
作者:Liu Liyuan, Yang Fu, Fan Yeye, Kao Chunyu, Wang Fei, Yu Lixiang, He Yong, Ji Jiadong, Yu Zhigang
| 期刊: | China CDC Weekly | 影响因子: | 2.900 |
| 时间: | 2023 | 起止号: | 2023 Mar 3; 5(9):206-212 |
| doi: | 10.46234/ccdcw2023.037 | ||
特别声明
1、本文转载旨在传播信息,不代表本网站观点,亦不对其内容的真实性承担责任。
2、其他媒体、网站或个人若从本网站转载使用,必须保留本网站注明的“来源”,并自行承担包括版权在内的相关法律责任。
3、如作者不希望本文被转载,或需洽谈转载稿费等事宜,请及时与本网站联系。
4、此外,如需投稿,也可通过邮箱info@biocloudy.com与我们取得联系。
