Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17âbillion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database(3). Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Unraveling the functional dark matter through global metagenomics.
阅读:5
作者:Pavlopoulos Georgios A, Baltoumas Fotis A, Liu Sirui, Selvitopi Oguz, Camargo Antonio Pedro, Nayfach Stephen, Azad Ariful, Roux Simon, Call Lee, Ivanova Natalia N, Chen I Min, Paez-Espino David, Karatzas Evangelos, Iliopoulos Ioannis, Konstantinidis Konstantinos, Tiedje James M, Pett-Ridge Jennifer, Baker David, Visel Axel, Ouzounis Christos A, Ovchinnikov Sergey, Buluç Aydin, Kyrpides Nikos C
| 期刊: | Nature | 影响因子: | 48.500 |
| 时间: | 2023 | 起止号: | 2023 Oct;622(7983):594-602 |
| doi: | 10.1038/s41586-023-06583-7 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
