Abstract
MOTIVATION: Recent research has revealed strong correlations between the human microbiome and various diseases. However, statistical analysis of microbiome data remains challenging due to its inherent sparsity and high dimensionality. PERMANOVA (Permutational multivariate analysis of variance using distance matrices) has been extensively employed to test the association between microbiome data and biological features. Its non-parametric nature makes it appealing, as it does not impose restrictions on data dimension or distribution. Despite its merits, several limitations have restricted its further application. RESULTS: This paper introduces E-MANOVA (Ensemble multivariate analysis of variance using distance matrices), a method designed to address these limitations. Traditional PERMANOVA lacks consistent robustness across different distance metrics and association signals, which can lead to power reduction in specific scenarios. Leveraging the idea of ensemble learning, we construct base tests by taking the similarity matrix to the rth power and then combine these tests to build a final ensemble test. Our resulting test statistic exhibits high power and robustness compared to other existing methods. Furthermore, we employ direct moment approximation and the Pearson type III distribution to approximate the permutation null distribution, completely avoiding the computationally intensive permutation procedure. Finally, we utilize the Cauchy combination method to aggregate p-values from multiple distances, eliminating the need to pre-specify a single distance measure before analysis. CONCLUSIONS: Our extensive simulations demonstrate that the proposed method outperforms existing methods across various situations. Further analysis of real data from cigarette smokers and curated microbiome data shows that our proposed method identifies the highest number of significant associations among all competing methods. Video Abstract.