Abstract
BACKGROUND: High-throughput sequencing technologies generate massive amounts of FASTQ data comprising nucleotide sequences, quality scores, and read identifiers, necessitating efficient compression to alleviate storage and transmission burdens. Compared to general-purpose compressors, specialized FASTQ compressors achieve higher compression performance by exploiting the inherent redundancy in FASTQ files. However, existing FASTQ-specialized compressors often suffer from limited data applicability and tend to over-optimize either compression ratio or compression speed at the expense of the other. RESULTS: We present zDUR, a reference-free FASTQ compressor designed for efficient and scalable handling of next-generation sequencing data across diverse platforms and sequencing data types. Benchmarking against six reference-free compressors on 15 representative datasets spanning four sequencing data types demonstrates that zDUR achieves a favorable overall balance between compression ratio and speed, with broad applicability across data types. In particular, on single-cell RNA-seq and spatial transcriptomics datasets, zDUR achieves over a tenfold increase in runtime performance while maintaining higher compression ratios than SPRING, one of the state-of-the-art reference-free FASTQ compressors. CONCLUSIONS: zDUR offers a scalable and efficient solution for reference-free FASTQ compression, balancing performance, speed, and usability across diverse datasets.