Are we throwing away good data? Evaluation of chimera detection algorithms on long-read amplicons reveals high false-positive rates across algorithms

我们是否在浪费有效数据?对长读长扩增子嵌合体检测算法的评估表明,各种算法的假阳性率都很高。

阅读:1

Abstract

Long-read amplicon sequencing has enabled us to return to full-length DNA barcodes, which benefit from the higher taxonomic resolution in metabarcoding-based biodiversity studies. However, chimeric sequences (artificial constructs formed when incomplete amplicons fuse during polymerase chain reaction (PCR)) remain challenging, potentially skewing diversity estimates and ecological inferences. Here, we benchmark three de novo chimera detection algorithms, uchime_denovo, removeBimeraDenovo, and chimeras_denovo, on simulated and empirical eukaryotic full-ITS (rRNA ITS1-5.8S-ITS2) datasets to evaluate their precision, sensitivity, and effects on the final OTUs composition/community structure. Upon simulated data, uchime_denovo achieved the highest precision even with default settings, whereas other algorithms displayed high false-positive chimera rates without setting adjustments. Similarly, the tests upon empirical data showed that uchime_denovo had lower false positive rates, whereas about half of the sequences in the putative chimeric batch were false positives when using chimeras_denovo and removeBimeraDenovo. We found that most of the false-negative chimeras contained multiple 5.8S regions, indicating PacBio library preparation artifacts rather than PCR artifacts. However, OTU-level comparisons indicated that overall richness and community-ordination patterns remain largely consistent across different chimera-filtering approaches with or without accounting for false positives and negatives.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。