Abstract
BACKGROUND: Metagenomics combined with High-throughput Chromosome Conformation Capture (Hi-C) provides a powerful approach to study microbial communities by linking genomic content with spatial interactions. Hi-C complements shotgun sequencing by revealing taxonomic composition, functional interactions, and genomic organization within a single sample. However, aligning Hi-C reads to metagenomic contigs is challenging due to variable insert sizes of Hi-C paired-end reads, multi-species complexity, and gaps in assemblies. Although several benchmark studies have evaluated general alignment tools and Hi-C data alignment, none have specifically focused on metagenomic Hi-C data. RESULTS: We evaluated seven alignment strategies commonly used in Hi-C analyses: BWA MEM -5SP, BWA MEM default, BWA aln default, Bowtie2 default, Bowtie2 -very-sensitive-local, Minimap2 default, and Chromap Hi-C default. We benchmarked these tools on one synthetic dataset and seven real-world environments. Performance was assessed based on the number of inter-contig Hi-C read pairs and their impact on downstream tasks, such as binning quality. CONCLUSIONS: We show that BWA MEM -5SP generally outperformed all other tools across most environments in terms of inter-contig read pairs and binning quality, followed by BWA MEM default. Chromap and Minimap2, while less effective in these metrics, demonstrated the highest computational efficiency.