Abstract
SUMMARY: FASTA is a widely used text-based format for storing nucleotide and protein sequences. The existing FASTA compressors usually focus on (slightly) improving the compression ratio, not on practical performance. We present FFC, a scalable FASTA compressor that achieves average compression speeds 4.7× and 11.4× higher than two high-performance compressors, zstd and NAF, respectively, across a benchmark set of seven single genomes. It also delivers average decompression speeds 3.5× and 2.7× higher than zstd and NAF, respectively. Although a chunk-based zstd variant with parallel decompression, pzstd, almost matches FFC speed, its compression ratio is on average by 23% worse than FFC's. For the experiment, a 14-core workstation and a RAM disk (to reduce the impact of I/O) were used. AVAILABILITY AND IMPLEMENTATION: FFC is freely available at github.com/kowallus/ffc and also as a Zenodo repository at 10.5281/zenodo.18892353, and the used datasets at 10.5281/zenodo.18873744.