Mapping Molecular Pathways of Histone Deacetylase in Alzheimer's Disease with Large Language Model‐Driven Knowledge Discovery

利用大型语言模型驱动的知识发现方法绘制阿尔茨海默病中组蛋白去乙酰化酶的分子通路图

阅读:2

Abstract

BACKGROUND: Despite the classical pathologies in Alzheimer's Disease (AD), novel molecular pathways such as histone deacetylase 6 (HDAC6) have shown promising results. With the growing literature, we need systematic approaches to study the intricate molecular pathways. The revolution in large language models (LLMs) presents an innovative and powerful approach to summarizing and discovering the vast knowledge in the whole field by automatically reading and understanding extensive literature. The results can be represented by knowledge graphs (KGs) that map molecular interactions and pathways in AD. METHOD: The corpus contained 265 papers from PubMed using the keywords “(Alzheimer's disease OR AD) AND (histone deacetylase OR HDAC)” in abstract/title and in human studies, on Sep 7, 2024. Here, we used the abstract part of the paper only. The preprocessing involved abbreviation expansion and coreference parsing. We used two LLMs: GPT‐4o and Gemini. The LLMs processed each sentence individually to extract subject‐predicate‐object triplets using finetuned prompts, where the subject/object must be molecules. To standardize the verbs, we prompted LLMs to map the verb to one of 34 predefined verb categories. To standardize the subject/object, we queried the UniProt database and used LLM to find the best candidate standardized name. The triplets from multiple papers were combined to form a KG. Performance was evaluated based on human‐assessed accuracy of triplets from 10 randomly paper abstracts. RESULT: Figure 1 shows the resulting KG from the 10 paper abstracts. It contained 15 molecules as the nodes and 17 interactions as the arrows. Tau, compound 15, and HDAC6 were the hub nodes. The table in Figure 1 in the lower right corner shows that GPT‐4o outperformed Gemini, achieving an accuracy of 78.4%, compared to Gemini's 58.9%. CONCLUSION: Leveraging LLMs such as GPT‐4o, we efficiently extracted structured knowledge from the field of AD and HDAC. The KG represents a foundational step to systematically understand what's known and what are the major gaps for novel therapeutic targets. The approach holds high potential even for broader scientific fields and automation of science.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。