Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination

使用 CHM13-T2T 基因组可通过最大限度地减少宿主 DNA 污染来改进宏基因组分析。

阅读:1

Abstract

Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。