Abstract
For gene expression analysis in complex microbiomes, utilizing both metagenomic and metatranscriptomic reads from the same sample enables advanced functional analysis. Due to their diversity, metagenomic contigs are often used as reference sequences instead of complete genomes. However, studies optimizing mapping strategies for both read types remain limited. In addition, although transcripts per million (TPM) is commonly used for normalization, few studies have evaluated the influence of ribosomal RNA (rRNA) in metatranscriptomic reads. This study compared Burrows-Wheeler Aligner-Maximal Exact Match (BWA-MEM) and Bowtie2 as mapping tools for metagenomic contigs. Even after optimizing Bowtie2 parameters, BWA-MEM showed higher efficiency in mapping both metagenomic and metatranscriptomic reads. Further analysis revealed that rRNA sequences contaminate predicted protein-coding regions in metagenomic contigs. When comparing TPM values across samples, contamination by rRNA led to an overestimation of TPM changes. This effect was more pronounced when the difference in rRNA content between samples was larger. These findings suggest that metatranscriptomic reads mapped to rRNA should be excluded before TPM calculations. This study highlights key factors influencing read mapping and quantification in gene expression analysis of complex microbiomes. The findings provide insights for improving analytical accuracy and advancing functional studies using both metagenomic and metatranscriptomic data.