Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays

用于基因组索引的位打包技术:II. 增强型后缀数组

阅读:1

Abstract

BACKGROUND: Suffix arrays and their variants are used widely for representing genomes in search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require large auxiliary data structures for storing longest common prefix and child interval information. We explore techniques for compressing ESAs to accelerate genomic search and reduce memory requirements. RESULTS: We evaluate various bitpacking techniques that store integers in fewer than 32 bits each, as well as bytecoding methods that reserve a single byte per integer whenever possible. Our results on the fly, chicken, and human genomes show that bytecoding with an exception guide array is the fastest method for retrieving auxiliary information. Genomic searching can be further accelerated using a data structure called a discriminating character array, which reduces memory accesses to the suffix array and the genome string. Finally, integrating storage of the auxiliary and discriminating character arrays further speeds up genomic search. CONCLUSIONS: The combination of exception guide arrays, a discriminating character array, and integrated data storage provide a 2- to 3-fold increase in speed for genomic searching compared with using bytecoding alone, and is 20 % faster and 40 % more space-efficient than an uncompressed ESA.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。