Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

基于 MASH 的对大肠杆菌基因组的分析揭示了 14 个不同的系统发育群

阅读:1

Abstract

In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。