Abstract
BACKGROUND: Ontology frameworks are essential for organizing complex biological knowledge, such as genes, phenotypes, and pathways, and for ensuring consistent data annotation and retrieval. In biological research, ontologies like the Gene Ontology (GO) and crop-specific trait ontologies (TO) for Oryza sativa (rice) standardize terminology across studies, supporting cross-study comparison and hypothesis generation. However, ontology annotations usually rely on expert manual review of the literature, a process that is accurate but time-consuming, labor-intensive, and difficult to scale as biological data grows. Manual approaches are also prone to inconsistencies and errors. The emergence of large language models (LLMs) such as ChatGPT, DeepSeek, and KIMI, along with curated databases like Rice-Alterome and PubAnnotation, offers new opportunities for semi-automated ontology curation. This study explores how these technologies can be integrated to develop an efficient literature-based curation system for rice trait ontology. METHODS: We developed a curation system that integrates Rice-Alterome-a comprehensive database of rice genomic variations, mutations, and sentence-level literature evidence linked to GO and TO terms with PubAnnotation, an open-source platform for collaborative text annotation. LLMs (DeepSeek and KIMI) were integrated via APIs to automate the extraction, annotation, and validation of trait-related information via prompt engineering. The system was evaluated through use cases designed to demonstrate its performance and functionality compared to manual curation. RESULTS: The proposed system substantially enhanced the retrieval and organization of literature evidence compared to manual methods. The integrated platform, available through a dedicated website, connects Rice-Alterome, PubAnnotation, and LLMs to streamline ontology curation and evidence discovery. This framework reduces the time domain experts need to locate and validate relevant information and provides interactive tools for users to add, merge, or refine trait annotations. The LLM-driven prompt-based querying also improved the identification of implicit or missing information that may be overlooked during manual curation. CONCLUSIONS: Integrating LLMs with Rice-Alterome and PubAnnotation offers a promising solution for automating rice trait ontology curation. This approach accelerates evidence collection and enhances data consistency and accessibility. Future extensions of this framework will target additional crops such as wheat and maize and focus on refining LLM-based retrieval and annotation mechanisms for broader agricultural genomics applications.