Ontology development and use for cholangiocarcinoma risk factors and predictions: a term enrichment data analysis and machine learning classification

胆管癌风险因素及预测的本体开发与应用:术语富集数据分析和机器学习分类

阅读:1

Abstract

BACKGROUND: Cholangiocarcinoma (CCA) is a critical public health problem in Thailand. The prevalence is much higher than other areas in the world. Data about CCA are stored in different data sources and standards in both research data sets and electronic health records (EHR). OBJECTIVE: This study aims to integrate and analyze CCA data from various sources to investigate risk factors and develop prediction models using the Cholangiocarcinoma Ontology (CCAO). METHODS: Datasets from Thailand were annotated with CCAO and analyzed using ontology-based term enrichment methods. We applied ontology term enrichment analysis, similar to that used with the Gene Ontology, for identifying significant risk factors for suspected CCA and patients with CCA. Our program provided a list of significant terms associated with CCA and a visualization of the ontology hierarchy with significant terms highlighted. The outputs of the term enrichment analyses have been used as the inputs to machine learning classification tasks. RESULTS: The results confirmed that indicators for CCA include dilated bile ducts, periductal fibrosis, and hepatic mass, based on ultrasound findings from several years prior. Our analysis also revealed demographic and lifestyle risk factors such as male gender, having no education, alcohol consumption, smoking, being a farmer, and having diabetes. We seeded a random forest classifier with the term enrichment results and predicted CCA patients with average 0.92 precision-recall curve score (0.023 standard deviation) with age, dilated bile ducts, periductal fibrosis, suspected CCA, and hepatic mass as the top five important features. CONCLUSIONS: These findings can be used to focus and monitor populations at risk for CCA. Expanding CCAO with molecular data related to CCA using ontology-driven term enrichment analysis and machine learning will help us to discover new hypotheses to decrease the morbidity and mortality of CCA in Thailand.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。