Evolutionary discovery and characterization of fungal transcriptional activators using active learning

利用主动学习进行真菌转录激活因子的进化发现和表征

阅读:1

Abstract

Biological discovery and design are increasingly being guided by predictive models in place of costly experimentation. However, existing datasets are often biased by overrepresentation from model organisms, leading to failures in evolutionary studies of non-model species. We present a hybrid framework that leverages high-throughput molecular assays and active learning to quantify biological properties across evolutionary space. We focus on transcriptional activators, which contain activation domains (ADs) that promote gene expression. ADs are intrinsically disordered and poorly conserved, which limits their study using comparative genomics. Here, we developed ADhunter, a high-capacity regression model that outperforms state-of-the-art algorithms in identifying and quantifying the strength of transcriptional activators. Model uncertainty was used to guide evolutionary sampling across 7.8 million proteins from 2,400 fungal genomes. We functionally characterized 9,836 ADs from 1,071 fungal genomes, providing a 15.5-fold expansion in genome representation compared to existing datasets. Comprehensive sampling from non-model genomes improved model generalizability and provides the first functional annotation for 3,416 proteins from 670 non-model fungi. Model interpretability analysis aligns with the biophysical model of AD function and reveals novel, underrepresented protein codes, highlighting the importance of sampling from non-model organisms to build evolutionarily robust models for predicting biological properties.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。