Abstract
Advances in omics technologies provide unprecedented opportunities for systems biology, yet integrating multi-omics data remains challenging due to its complexity, heterogeneity, and the sparsity of prior knowledge networks. Here, we introduce a multi-omics data integration analysis (MODA) framework that fully incorporates prior knowledge to identify hub molecules and pathways, and elucidate biological mechanisms. By leveraging multiple machine learning approaches, MODA transforms raw omics data into a feature importance matrix that is mapped onto a biological knowledge graph to mitigate omics data noise. Then, it uses graph convolutional networks with attention mechanisms to capture intricate molecular relationships and rank molecules via a feature-selective layer. Ultimately, MODA transcends the limitations of predefined pathway annotations by employing an overlapping community detection algorithm to extract core functional modules that are involved in multiple pivotal disease pathways. Systematic evaluations show that MODA outperforms seven existing multi-omics integration methods in classification performance while maintaining biological interpretability. Moreover, MODA achieves superior stability in pan-cancer datasets. Application to the multi-omics datasets of prostate cancer reveals a key role for carnitine and palmitoylcarnitine, regulated by BBOX1 in the progression of prostate cancer. Population samples and in vitro experiments further validate these findings. With high data utilization efficiency and low computational cost, MODA serves as a robust tool for uncovering novel disease mechanisms and advancing precision medicine.