Fine-tuned GPT-based foundation models effectively reconstruct bacterial transcriptional regulatory networks from literature

经过微调的基于GPT的基础模型能够有效地从文献中重建细菌转录调控网络。

阅读:1

Abstract

INTRODUCTION: Life has the property to produce from a single genome, the collection of DNA molecules, different cell types, as well as mechanisms for bacteria to adapt to environmental changes. Although regulation can happen at different levels, regulation of transcription initiation, the start of copying DNA into RNA, is the most studied level in bacteria. The collection of regulators and their regulated elements defines transcriptional regulatory networks (TRNs), whose study has driven relevant areas, such as antimicrobial resistance. Their analyses and understanding depend on some few highly manually curated databases. The traditional way to reconstruct these networks is by manual curation of the literature, which is accurate, but also demanding and time-consuming. These limitations have resulted in the shortage and incompleteness of bacterial TRNs. METHODS: Here, we present a novel ensemble model approach using two GPT-based foundation models (LLaMA-3 and GPT-4o mini) to effectively reconstruct TRNs from the literature. We applied a supervised fine-tuning strategy with sentences from Escherichia coli literature to train models to predict the type of regulatory effect between a transcription factor and a regulated element (gene/operon). To evaluate the performance of reconstructing a curated TRN, we used 264 full-text articles of Salmonella Typhimurium, a pathogen of clinical interest. RESULTS: With the test data, both models obtained significant performance (F1-Score > 0.87, Matthews correlation coefficient > 0.82). For the curated TRN reconstruction, the ensemble approach using the agreement of models correctly reconstructed 80% of the TRN (Recall: 0.80, F1-score: 0.64). We applied the approach to reconstruct a large Salmonella TRN using the literature available at the time on transcriptional regulation of this bacterium (2,278 articles). This network was described with network metrics, over-representation analyses, and compared to existing biological knowledge. DISCUSSION: Our approach overtook the performance of prior works predicting the effect of the interaction. The analysis of the TRN of the 2,278 articles showed the effectiveness of our approach to reconstruct TRNs of diverse bacteria, as the network aligns with biological knowledge. Thus, our work may support the study of bacteria of biological and clinical interest, especially those without a reconstructed TRN.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。