Abstract
Contemporary glioma diagnosis integrates molecular features (e.g., IDH, 1p/19q) with histopathology to guide clinical decision-making. However, divergent imaging protocols and variable molecular testing standards across institutions result in pervasive data heterogeneity in multi-center studies. These inconsistencies manifest as incomplete imaging sequences and missing annotations, hindering the development of robust AI-driven diagnostic frameworks. To address this, we propose SSL-MISS-Net (Self-Supervised Learning with MIssing-label encoding and Semantic Synthesis), a unified framework that simultaneously tackles input-side modality incompleteness via cross-modal self-supervised learning and output-side annotation deficiencies through a missing-label synergistic strategy, thereby reducing reliance on complete data. To our knowledge, this is the first study to jointly address both challenges, effectively unlocking the diagnostic potential of imperfect clinical data. We evaluated SSL-MISS-Net with five-fold cross-validation and two independent test sets on multi-center cohorts (six in-house datasets, three public repositories; N = 2238). Compared with sub-optimal methods AHI, SSL-MISS-Net achieved significant accuracy gains of 4% (validation) and 10% (test) for integrated glioma diagnosis. Moreover, the framework expanded the amount of clinically usable data by 256% and consistently outperformed state-of-the-art methods trained on complete data. These results demonstrate SSL-MISS-Net's clinical translatability and exceptional resilience to data imperfections in neuro-oncology AI diagnostics.