A novel dynamic Bayesian network approach for data mining and survival data analysis

一种用于数据挖掘和生存数据分析的新型动态贝叶斯网络方法

阅读:1

Abstract

BACKGROUND: Censorship is the primary challenge in survival modeling, especially in human health studies. The classical methods have been limited by applications like Kaplan-Meier or restricted assumptions like the Cox regression model. On the other hand, Machine learning algorithms commonly rely on the high dimensionality of data and ignore the censorship attribute. In addition, these algorithms are more sophisticated to understand and utilize. We propose a novel approach based on the Bayesian network to address these issues. METHODS: We proposed a two-slice temporal Bayesian network model for the survival data, introducing the survival and censorship status in each observed time as the dynamic states. A score-based algorithm learned the structure of the directed acyclic graph. The likelihood approach conducted parameter learning. We conducted a simulation study to assess the performance of our model in comparison with the Kaplan-Meier and Cox proportional hazard regression. We defined various scenarios according to the sample size, censoring rate, and shapes of survival and censoring distributions across time. Finally, we fit the model on a real-world dataset that includes 760 post gastrectomy surgery due to gastric cancer. The validation of the model was explored using the hold-out technique based on the posterior classification error. Our survival model performance results were compared using the Kaplan-Meier and Cox proportional hazard models. RESULTS: The simulation study shows the superiority of DBN in bias reduction for many scenarios compared with Cox regression and Kaplan-Meier, especially in the late survival times. In the real-world data, the structure of the dynamic Bayesian network model satisfied the finding from Kaplan-Meier and Cox regression classical approaches. The posterior classification error found from the validation technique did not exceed 0.04, representing that our network predicted the state variables with more than 96% accuracy. CONCLUSIONS: Our proposed dynamic Bayesian network model could be used as a data mining technique in the context of survival data analysis. The advantages of this approach are feature selection ability, straightforward interpretation, handling of high-dimensional data, and few assumptions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。