Ontology-driven association rule mining for biomedical entity relationships: integrating hierarchical knowledge to improve gene-disease discovery

基于本体的生物医学实体关系关联规则挖掘：整合层级知识以改进基因-疾病发现

阅读：1

作者：Naqash,Mian Athar,Amin,Muhammad,Uddin,Jamal,Hussein,Hany S,Raza,Ali,Alghamdi,Wajdi,Mostafa,Hala AbdelHameed,Alkahtani,Hend Khalid

期刊：	Scientific Reports	影响因子：	3.900
时间：	2026	起止号：	2026 Mar 11;16(1)
doi：	10.1038/s41598-026-42584-y

Abstract

Reliable links between genes and diseases are central to biomedical research; however, many computational methods overlook the semantic and hierarchical layers of ontologies, missing indirect relationships and producing shallow association scores. We propose an ontology-driven framework for gene-disease association mining that integrates hierarchical knowledge from the Gene Ontology and Disease Ontology. Our text-mining pipeline processes PubMed text by cleaning, annotating, and extracting sentence-level co-occurrences of biomarker-related terms. We evaluated and compared well-known association rule mining algorithms, namely Apriori, FP-Growth, and Eclat, and applied a tie-aware rank-based transformation to correct for non-normal distributions of association scores. The resulting Athar Semantic Enriched Association (ASEA) score combines entity-specific associations with Hierarchical Ontology Associations, with an enhanced Apriori variant showing superior performance in capturing direct and indirect associations. Benchmarking against the Comparative Toxicogenomics Database, ASEA detected 17 high-grade associations (30.4% more than Apriori and Eclat, 88.9% more than FP-Growth). In total, ASEA produced 185 associations, compared with 217 for Apriori, 166 for Eclat, and 71 for FP-Growth. Among these, 21 belong to high-confidence databases (Case 1), 28 are supported by substantial literature, but not yet high-confidence (Case 2), 39 have low/intermediate database support with no strong literature (Case 3), and 22 are purely speculative (Case 4), including 12 particularly novel associations absent from the curated resources. Overall, this framework provides a transparent and extensible pipeline for biomedical knowledge discovery, combining statistical co-occurrence with ontology-driven enrichment to retrieve established knowledge and generate reliable predictions for precision medicine and hypothesis-generation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

转录调控

凋亡

线粒体

巨噬细胞

传染病

自噬

氧化应激

磷酸化

血管生成

肠道菌群

囊泡

中性粒细胞

单细胞

药物研究

外泌体

3D/类器官

细胞衰老

DNA甲基化

缺氧低氧

铁死亡

乙酰化

miRNA

组蛋白修饰

泛素化

炎性小体

代谢重编程

焦亡

树突状细胞

m6A/m5C/m7G

肿瘤微环境

空间多组学

细胞基因治疗

lncRNA

内质网应激

治疗耐药

Treg

相分离

免疫代谢

上皮间质转化

染色质重塑

脂质过氧化

蛋白质稳态

铁代谢

cGAS-STING

乳酸化

低氧缺氧

碱基编辑

脂代谢

蛋白降解

NK 细胞

肠脑轴

circRNA

MDSC

细胞极性

氨基酸代谢

肿瘤异质性

piRNA

翻译调控

NETosis

氧化脂质

溶酶体功能

RNA 编辑

细胞干性

琥珀酰化

CAR-NK

Tfh

冷应激

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

MAIT 细胞

自噬流

程序性坏死

丙酰化

肠肝轴