MASE-GC: a multi-omics autoencoder and stacking ensemble framework for gastric cancer classification

MASE-GC:一种用于胃癌分类的多组学自编码器和堆叠集成框架

阅读:2

Abstract

BACKGROUND: Gastric cancer (GC) is one of the most common malignant tumors and remains a leading cause of cancer-related mortality worldwide. Accurate classification of GC is critical for improving diagnosis, prognosis, and personalized treatment. Recent advances in high-throughput sequencing have enabled the generation of large-scale multi-omics data, offering new opportunities for precise disease stratification. However, existing studies often rely on single-omics approaches or single-model frameworks, which fail to capture the full complexity of tumor biology and suffer from limited sensitivity, specificity, and generalizability. METHODS: We propose MASE-GC (Multi-Omics Autoencoder and Stacking Ensemble for Gastric Cancer), a novel computational framework that integrates exon expression, mRNA expression, miRNA expression, and DNA methylation profiles. MASE-GC employs modality-specific autoencoders to extract compact latent features from heterogeneous omics layers and combines them through weighted fusion. The integrated features are then classified using a stacking ensemble of five base learners-Support Vector Machine, Random Forest, Decision Tree, AdaBoost, and Convolutional Neural Network-followed by an XGBoost meta-classifier. A robust preprocessing pipeline, including feature filtering, normalization, and SMOTE-Tomek balancing, is incorporated to address noise, high dimensionality, and class imbalance. RESULTS: Comprehensive experiments on the TCGA-STAD cohort demonstrated that MASE-GC achieved superior classification performance compared with single-omics and baseline methods, reaching an accuracy of 0.981, precision of 0.9845, recall of 0.992, F1-score of 0.9883, and specificity of 0.824. Ablation studies confirmed the complementary contributions of autoencoders and ensemble components, with CNN and Random Forest providing the largest performance gains. Furthermore, independent validation on external cohorts (GSE62254, GSE15459, GSE84437, and ICGC) confirmed the robustness and generalizability of MASE-GC, with accuracy consistently above 0.958 and F1-scores exceeding 0.969. CONCLUSION: MASE-GC advances computational oncology by offering an effective and generalizable framework for GC classification. By integrating multi-omics fusion, ensemble learning, and robust preprocessing, the proposed model improves both sensitivity and specificity, reduces false positives, and demonstrates strong potential for clinical translation in precision diagnostics and treatment planning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。