A k -mer-based maximum likelihood method for estimating distances of reads to genomes enables genome-wide phylogenetic placement

基于k-mer的最大似然法估计reads与基因组的距离,从而实现全基因组系统发育定位。

阅读:1

Abstract

Comparing each sequencing read in a sample to large databases of known genomes has become a fundamental tool with wide-ranging applications, including metagenomics. These comparisons can be based on read-to-genome alignment, which is relatively slow, especially if done with the high sensitivity needed to characterize queries without a close representation in the reference dataset. A more scalable alternative is assigning taxonomic labels to reads using signatures such as k-mer presence/absence. A third approach is placing reads on a reference phylogeny, which can provide a far more detailed view of the read than a single label. How-ever, phylogenetic placement is currently only possible at scale for marker genes, constituting a small fraction of the genome. No current method is able to place all reads originating from anywhere in the genome on an ultra-large reference phylogeny. In this paper, we introduce krepp, an alignment-free k-mer-based method that enables placing reads from anywhere on the genome on an ultra-large reference phylogeny by first computing a distance from each read to every reference genome. To compute these distances and placements, krepp uses a host of algorithmic techniques, including locality-sensitive hashing to allow inexact k-mer matches, k-mer coloring graphs to map k-mers to reference genomes, maximum likelihood distance estimation, and likelihood ratio test for placement. Our experiments show that krepp is extremely scalable, improving on alignment by up to roughly 10×, computes very accurate distances that approximate those using alignments, and produces highly accurate placements. When used in the metagenomics context, the precise phylogenetic identifications provided by krepp improve our ability to compare and differentiate samples from different environments.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。