Abstract
Effective cancer subtype classification from multi-omics data remains challenging due to incomplete omics data and limited sample sizes. While graph convolutional networks (GCNs) have been used to incorporate inter-sample relationships for enhancing small-sample classification, their performance deteriorates when a certain omics modality is entirely missing. Here, we propose MOGEDN, a novel framework for cancer subtype classification using multi-omics encoder-decoder networks designed to reconstruct the latent features of missing omics data. The reconstructed features are integrated with available omics features to enable robust prediction under small-sample and missing-omics settings. We develop a step-wise algorithm to pretrain our model with diverse cancer types then to finetune for a specific cancer type while incorporating inter-sample and cross-omics dependencies. Evaluated on TCGA cancer datasets including subtypes with fewer than 50 samples, MOGEDEN consistently outperforms state-of-the-art baselines in accuracy and F1 scores. Moreover, MOGEDN's feature analysis provides two complementary biomarker sets: biomarkers shared across diverse cancer types in the pretraining phase; and biomarkers for a specific cancer type in the finetuning phase, facilitating model interpretability, and biological findings. These results highlight decoder-based imputation as a powerful approach to enhance multi-omics learning, delivering accurate classification, robust few-shot performance, and multi-scale biomarker discovery in incomplete multi-omics cohorts.