Abstract
Amplicon sequencing enables taxonomic profiling of microbial communities but offers limited insight into their functional potential. Existing tools such as Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2) infer functions through phylogenetic placement and ancestral state reconstruction; however, these methods are computationally intensive and inefficient for large-scale data sets. To address these challenges, we introduce microbiome graphics processing unit (GPU)-based function inference (MGFunc), an ultra-high-throughput framework for microbiome functional inference leveraging multi-GPU acceleration. MGFunc transforms functional prediction for amplicons into standardized matrix multiplication using a pre-constructed genomic content network. It further integrates split data loading, matrix partition, and dynamic scheduling across multiple GPUs, enabling scalable, batch-wise analysis of millions of samples under limited GPU memory and system random access memory (RAM). Compared to PICRUSt2, MGFunc achieves speedups of up to several hundred thousand times, completing the functional interpretation of one million samples within one minute by four GPUs on a single server. This work provides a highly efficient and low-latency solution for ultra-large microbiome data sets functional inference, paving the way for global-scale microbiome studies. The MGFunc software is freely accessible at https://github.com/qdu-bioinfo/MGFunc.IMPORTANCEUnderstanding what microbes do-their functions-is essential for studying health, disease, agriculture, and the environment. While cost-effective sequencing methods like 16S rRNA gene analysis are widely used, they do not directly reveal microbial functions. Existing tools that predict these functions from 16S data are often too slow for today's large studies involving hundreds of thousands of samples. In this work, we developed microbiome graphics processing unit (GPU)-based function inference (MGFunc), a new method that predicts microbial functions quickly and accurately by using GPUs and a streamlined mathematical approach. MGFunc can analyze over one million samples in under a minute, making it one of the fastest tools available. This enables researchers to study the functional potential of microbial communities on a truly global and population scale.