Enhancing Data Compression: Recent Innovations in LZ77 Algorithms.

阅读:4
作者:Hong Aaron, Boucher Christina
The growing volume of genomic data, driven by advances in sequencing technologies, demands efficient data compression solutions. Traditional algorithms, such as Lempel-Ziv77 (LZ77), have been fundamental in offering lossless compression, yet they often fall short when applied to the highly repetitive structures typical of genomic sequences. This review explores the evolution of LZ77 and its adaptations for genomic data compression, highlighting specialized algorithms designed to handle redundancy in large-scale sequencing datasets efficiently. Innovations in this field have enhanced compression ratios and processing efficiencies leveraging intrinsic redundancy within genomic datasets. We critically examine a spectrum of LZ77-based algorithms, including newer adaptations for external and semi-external memory settings, and contrast their efficacy in managing large-scale genomic data. We conducted experiments to evaluate the performance of several algorithms, including KKP2, RLE-LZ, SE-KKP, BGone, and PFP-LZ77, on both real-world datasets from the Pizza&Chili repetitive corpus, Salmonella genomes, and human chromosome 19 genomes. These results underscore the trade-offs between time and memory consumption between algorithms. This article aims to provide a comprehensive guide on the current landscape and future directions of data compression technologies, equipping bioinformaticians and other practitioners with insight to tackle the escalating data challenges in genomics and beyond.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。