Ontology-driven association rule mining for biomedical entity relationships: integrating hierarchical knowledge to improve gene-disease discovery

基于本体的生物医学实体关系关联规则挖掘:整合层级知识以改进基因-疾病发现

阅读:1

Abstract

Reliable links between genes and diseases are central to biomedical research; however, many computational methods overlook the semantic and hierarchical layers of ontologies, missing indirect relationships and producing shallow association scores. We propose an ontology-driven framework for gene-disease association mining that integrates hierarchical knowledge from the Gene Ontology and Disease Ontology. Our text-mining pipeline processes PubMed text by cleaning, annotating, and extracting sentence-level co-occurrences of biomarker-related terms. We evaluated and compared well-known association rule mining algorithms, namely Apriori, FP-Growth, and Eclat, and applied a tie-aware rank-based transformation to correct for non-normal distributions of association scores. The resulting Athar Semantic Enriched Association (ASEA) score combines entity-specific associations with Hierarchical Ontology Associations, with an enhanced Apriori variant showing superior performance in capturing direct and indirect associations. Benchmarking against the Comparative Toxicogenomics Database, ASEA detected 17 high-grade associations (30.4% more than Apriori and Eclat, 88.9% more than FP-Growth). In total, ASEA produced 185 associations, compared with 217 for Apriori, 166 for Eclat, and 71 for FP-Growth. Among these, 21 belong to high-confidence databases (Case 1), 28 are supported by substantial literature, but not yet high-confidence (Case 2), 39 have low/intermediate database support with no strong literature (Case 3), and 22 are purely speculative (Case 4), including 12 particularly novel associations absent from the curated resources. Overall, this framework provides a transparent and extensible pipeline for biomedical knowledge discovery, combining statistical co-occurrence with ontology-driven enrichment to retrieve established knowledge and generate reliable predictions for precision medicine and hypothesis-generation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。