UMGAP: the Unipept MetaGenomics Analysis Pipeline

UMGAP:Unipept 宏基因组学分析流程

阅读:1

Abstract

BACKGROUND: Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. RESULTS: Because environmental samples may contain strains and species that are not covered in reference databases and because protein sequences are more conserved than the genes encoding them, we explore the alternative route of taxonomic profiling based on protein coding regions translated from the shotgun metagenomics reads, instead of directly processing the DNA reads. We therefore developed the Unipept MetaGenomics Analysis Pipeline (UMGAP), a highly versatile suite of open source tools that are implemented in Rust and support parallelization to achieve optimal performance. Six preconfigured pipelines with different performance trade-offs were carefully selected, and benchmarked against a selection of state-of-the-art shotgun metagenomics taxonomic profiling tools. CONCLUSIONS: UMGAP's protein space detour for taxonomic profiling makes it competitive with state-of-the-art shotgun metagenomics tools. Despite our design choices of an extra protein translation step, a broad spectrum index that can identify both archaea, bacteria, eukaryotes and viruses, and a highly configurable non-monolithic design, UMGAP achieves low runtime, manageable memory footprint and high accuracy. Its interactive visualizations allow for easy exploration and comparison of complex communities.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。