Carmna: classification and regression models for nitrogenase activity based on a pretrained large protein language model

Carmna:基于预训练大型蛋白质语言模型的固氮酶活性分类和回归模型

阅读:2

Abstract

Nitrogen-fixing microorganisms play a critical role in the global nitrogen cycle by converting atmospheric nitrogen into ammonia through the action of nitrogenase (EC 1.18.6.1). In this study, we employed six machine learning algorithms to model the classification and regression of nitrogenase activity (Carmna). Carmna utilized the pretrained large-scale model ProtT5 for feature extraction from nitrogenase sequences and incorporated additional features, such as gene expression and codon preference, for model training. The optimal classification model, based on XGBoost, achieved an average area under receiver operating characteristic curve of 0.9365 and an F1 score of 0.85 in five-fold cross-validation. For regression, the best-performing model was a stacking approach based on support vector regression, with an average R2 of 0.5572 and a mean absolute error of 0.3351. Further interpretability analysis of the optimal regression model revealed that not only the proportion and codon preferences of standard amino acids, but also the expression levels and spatial distance of nitrogenase genes were associated with nitrogenase activity. We also obtained the minimum nitrogen-fixing nif cluster. This study deepens our understanding of the complex mechanisms regulating nitrogenase activity and contributes to the development of efficient bio-fertilizers.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。