Abstract
MOTIVATION: The integration of tumour imaging data and molecular sequencing information can advance our understanding of cancer biology by combining complementary perspectives of tumour phenotype and genotype. However, integrating multi-modal data across heterogeneous and high-dimensional data domains remains a significant computational challenge. RESULTS: Here, we introduce an unsupervised manifold alignment approach for real-world data integration based on Joint Multidimensional Scaling (Joint MDS) and extend it to a three-modality framework (Joint MDS3). We apply this method to integrate radiomic features from magnetic resonance imaging (MRI) with transcriptomic, epigenomic, and copy number variation (CNV) data from patients with glioblastoma multiforme (GBM) and lower-grade gliomas (LGG). Compared to baselines such as Pamona and single-cell optimal transport (SCOTv2), Joint MDS consistently outperforms baseline Pamona in cases and achieves competitive performance relative to baseline SCOTv2, outperforming its fraction of samples closer to an incorrect match (FOSCTTM) in four out of six cases. Joint MDS attains an average label transfer accuracy of 74.8%, approximately 4% higher than that of Pamona and SCOTv2, and reduces FOSCTTM to 51% or less across real-world datasets. We further demonstrate our extension JointMDS3 on both synthetic and real-world examples. Our results highlight the potential of Joint MDS to enhance the integration of diverse data types into a unified representation, ultimately advancing computational approaches in complex diseases. AVAILABILITY AND IMPLEMENTATION: The implementation of our work is available at gitlab.ethz.ch/BMDSlab/publications/oncology/joint-representation-learning-for-oncology-applications and archived at doi.org/10.5281/zenodo.17219404.