BioMedGraphica: An All-in-One Platform for Joint Textual Biomedical Prior Knowledge and Numeric Graph Generation

BioMedGraphica:一个用于联合生成文本生物医学先验知识和数值图的一体化平台

阅读:1

Abstract

Multi-omic data analysis is essential for scientific discovery in precision medicine. However, translating statistical results of omic data analysis into novel scientific hypothesis remains a significant challenge. Human experts must manually review analysis results and generate new hypothesis based on extensive and inter-connected biomedical prior knowledge, which is subjective and not scalable. While large language models (LLMs) can accelerate the discovery, their reasoning improves when grounded in structured, auditable and comprehensive biomedical prior knowledge. Biomedical knowledge, however, is scattered across heterogeneous databases that use diverse and inconsistent nomenclature systems, making it difficult to integrate resources into a unified format for scalable analysis. This fragmentation limits the ability of AI systems to fully leverage biomedical data for scientific discovery. To address these challenges, we developed BioMedGraphica , an all-in-one platform that harmonizes fragmented biomedical resources by integrating 11 entity types and 30 relation types from 43 databases into a unified knowledge graph containing 2,306,921 entities and 27,232,091 relations. In addition, to the best of our knowledge, this is the first work to propose a novel Textual-Numeric Graph (TNG) data-structure for multi-omics data analysis. In TNG, textual information captures prior biological knowledge (e.g., transcription start sites, functions, mechanisms), while numeric values represent quantitative biomedical features, and the integrated relations can help uncover mechanisms. By bridging prior knowledge with user-specific data, TNG is a novel and ideal data-structure for the development of graph foundation models, with the potential to improve prediction performance and interpretability, while also augmenting LLMs by supplying graph-structured mechanistic context to strengthen reasoning. The details for BioMedGraphica code can be accessed by github link: https://github.com/FuhaiLiAiLab/BioMedGraphica and BioMedGraphica knowledge graph data can be downloaded from huggingface dataset: https://huggingface.co/datasets/FuhaiLiAiLab/BioMedGraphica.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。