KmerKeys: a web resource for searching indexed genome assemblies and variants

KmerKeys:一个用于搜索已索引基因组组装和变异的网络资源

阅读:1

Abstract

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。