An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses

一种利用k-mer呈现模式进行群体基因组分析的无比对和无参考策略

阅读:1

Abstract

Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments. K-mer decomposition is an alternative to graph-based pangenomes. However, how to directly use k-mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants of k-mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267 Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified with k-mers recapitulates that obtained using SNPs, indicating the effectiveness of k-mer-based approach, and higher genetic diversity within real dataset supported k-mers contained more genetic variants. Based on k-mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution of S. cerevisiae. Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。