A curation system of rice trait ontology with reliable interoperation by LLM and PubAnnotation

基于LLM和PubAnnotation的可靠互操作的水稻性状本体管理系统

阅读:1

Abstract

BACKGROUND: Ontology frameworks are essential for organizing complex biological knowledge, such as genes, phenotypes, and pathways, and for ensuring consistent data annotation and retrieval. In biological research, ontologies like the Gene Ontology (GO) and crop-specific trait ontologies (TO) for Oryza sativa (rice) standardize terminology across studies, supporting cross-study comparison and hypothesis generation. However, ontology annotations usually rely on expert manual review of the literature, a process that is accurate but time-consuming, labor-intensive, and difficult to scale as biological data grows. Manual approaches are also prone to inconsistencies and errors. The emergence of large language models (LLMs) such as ChatGPT, DeepSeek, and KIMI, along with curated databases like Rice-Alterome and PubAnnotation, offers new opportunities for semi-automated ontology curation. This study explores how these technologies can be integrated to develop an efficient literature-based curation system for rice trait ontology. METHODS: We developed a curation system that integrates Rice-Alterome-a comprehensive database of rice genomic variations, mutations, and sentence-level literature evidence linked to GO and TO terms with PubAnnotation, an open-source platform for collaborative text annotation. LLMs (DeepSeek and KIMI) were integrated via APIs to automate the extraction, annotation, and validation of trait-related information via prompt engineering. The system was evaluated through use cases designed to demonstrate its performance and functionality compared to manual curation. RESULTS: The proposed system substantially enhanced the retrieval and organization of literature evidence compared to manual methods. The integrated platform, available through a dedicated website, connects Rice-Alterome, PubAnnotation, and LLMs to streamline ontology curation and evidence discovery. This framework reduces the time domain experts need to locate and validate relevant information and provides interactive tools for users to add, merge, or refine trait annotations. The LLM-driven prompt-based querying also improved the identification of implicit or missing information that may be overlooked during manual curation. CONCLUSIONS: Integrating LLMs with Rice-Alterome and PubAnnotation offers a promising solution for automating rice trait ontology curation. This approach accelerates evidence collection and enhances data consistency and accessibility. Future extensions of this framework will target additional crops such as wheat and maize and focus on refining LLM-based retrieval and annotation mechanisms for broader agricultural genomics applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。