JARVIS3: an efficient encoder for genomic data

JARVIS3:一种高效的基因组数据编码器

阅读:3

Abstract

MOTIVATION: Large-scale genomic projects grapple with the complex challenge of reducing medium- and long-term storage space and its associated energy consumption, monetary costs, and environmental footprint. RESULTS: We present JARVIS3, an advanced tool engineered for the efficient reference-free compression of genomic sequences. JARVIS3 introduces a pioneering approach, specifically through enhanced table memory models and probabilistic lookup-tables applied in repeat models. These optimizations are pivotal in substantially enhancing computational efficiency. JARVIS3 offers three distinct profiles: (i) rapid computation with moderate compression, (ii) a balanced trade-off between time and compression, and (iii) slower computation with significantly higher compression ratios. The implementation of JARVIS3 is rooted in the C programming language, building upon the success of its predecessor, JARVIS2. JARVIS3 shows substantial speed improvements relative to JARVIS2 while providing slightly better compression. Furthermore, we provide a versatile C/Bash implementation, facilitating the application in FASTA and FASTQ data, including the capability for parallel computation. In addition, JARVIS3 includes a mode for outputting bit information, as well as providing the Normalized Compression and bit rates, facilitating compression-based analysis. This establishes JARVIS3 as an open-source solution for genomic data compression and analysis. AVAILABILITY AND IMPLEMENTATION: JARVIS3 is freely available at https://github.com/cobilab/jarvis3.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。