Abstract
Microbiomes play crucial roles in diverse ecosystems, spanning environmental, agricultural, and human health domains. However, in-depth metagenomic data analysis presents significant technical and resource challenges, particularly at scale. Existing computational pipelines are typically limited to either reference-based or reference-free approaches and exhibit inefficiencies in process large datasets. Here, we introduce MetaflowX (https://github.com/01life/MetaflowX), an open-resource workflow integrating both analytical paradigms for enhanced metagenomic investigations. This modular framework encompasses short-read quality control, rapid microbial profiling, hybrid contig assembly and binning, high-quality metagenome-assembled genome (MAG) identification, as well as bin refinement and reassembly. Benchmarking tests showed that MetaflowX completed full metagenomic analyses up to 14-fold faster and with 38% less disk usage than existing workflows. It also recovered the highest number of high-quality and taxonomically diverse MAGs. A dedicated reassembly module further improved MAG quality, increasing completeness by 5.6% and reducing contamination by 53% on average. Functional annotation modules enable detection of key features, including virulence and antibiotic resistance genes. Designed for extensibility, MetaflowX provides an efficient solution addressing current and emerging demands in large-scale metagenomic research.