Pipeline to explore information on genome editing using large language models and genome editing meta-database

利用大型语言模型和基因组编辑元数据库探索基因组编辑信息的流程

阅读:2

Abstract

Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。