Abstract
INTRODUCTION: Early diagnosis of endometrial cancer is critical for improving survival. However, most AI-based methods are developed using single-modality data, and most of them are uninterpretable and do not offer privacy protection, or they completely ignore privacy issues. We propose a multimodal AI framework that utilizes histopathology whole-slide images (WSIs) and clinical data and incorporates both explainability and privacy-aware learning. METHODS: Five hundred and twenty-nine patients' data (354 early-stage and 175 advanced-stage) with 794 WSIs and 208,000 image patches were collected for the study. By using a convolutional neural network (CNN), morphological features were extracted from WSIs, and clinical variables were encoded by a multi-layer perceptron, respectively. These two modalities of information (the learned representations) were combined to make the final category prediction. Interpretability was enabled through Grad-CAM and clinical feature attribution, and privacy-aware training was supported by secure parameter aggregation. RESULTS: The multimodal model obtained an accuracy of 0.91 and an AUC of 0.95, thus it exceeded the performance of clinical-only (accuracy = 0.78, AUC = 0.81) and histopathology-only (accuracy = 0.85, AUC = 0.89) models with a considerable increase in sensitivity (0.89) and specificity (0.93). The performance was preserved by privacy-aware learning as well. The net clinical benefit was maximum as per the Decision Curve Analysis. CONCLUSION: This framework offers a solution that is accurate, interpretable, and privacy-preserving, thus it can act as a diagnostic aid for early endometrial cancer. However, since the model was developed and evaluated using only the TCGA-UCEC cohort, external multi-center validation is required to confirm generalizability across diverse clinical populations, imaging protocols, and laboratory conditions.