Leveraging knowledge graphs and large language models for integrating molecular variants and clinical insights in COVID-19 research

利用知识图谱和大型语言模型整合新冠病毒研究中的分子变异和临床见解

阅读：1

作者：Yang,Jiaxin,Zhang,Fushuai,Cao,Ruifang,Chen,Yingying,Chen,Yiping,Chen,Yuxin,Li,Yixue,Zhao,Guoping,Wang,Ying,Ling,Yunchao,Zhang,Guoqing

期刊：	Biosafety and Health	影响因子：	3.000
时间：	2026	起止号：	2026 Feb;8(1):71-79
doi：	10.1016/j.bsheal.2025.12.003	研究方向：	微生物学
疾病类型：	新冠

Abstract

The relentless emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants continues to challenge global health, as high mutation rates and complex pathogenicity obscure molecular mechanisms and impede clinical progress. Despite extensive research across viral evolution, structural biology, immunology, diagnostics, and therapeutics, the resulting vast and rapidly outdated literature has widened the gap between fundamental discovery and medical application. Here, we systematically mined 439,724 coronavirus disease 2019 (COVID-19) publications using fine-tuned large language models to extract and distill knowledge across nine domains: antibodies, vaccines, serology, biochemistry, therapeutics, clinical presentation, risk factors, biomarkers, and diagnostics. These insights were integrated into a unified graph of 1,427,596 triples (CoVAR-KG). Covering 90 % of known spike-protein variant sites, our knowledge graph forges molecular-to-clinical links that reveal how specific mutations influence antigenicity, transmissibility, and treatment response. By resolving data fragmentation, this resource accelerates target identification and streamlines hypothesis generation. Building on CoVAR-KG, we developed COVID-19 variant risk watcher (CVRW), an early-warning framework that quantifies the threat of emerging variants for real-time surveillance. Coupling the graph with retrieval-augmented GPT-4o enables rapid and in-depth comparisons of variant functionality and immune escape potential. These integrative tools furnish timely insights for vaccine design, therapeutic optimization, and pandemic preparedness, establishing a versatile platform for combating current and future viral threats.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。