Abstract
OBJECTIVE: Accurate detection of PIK3CA mutations is essential for guiding PI3K-targeted therapies in breast cancer, yet sequencing is not universally accessible, and single-modality prediction models have limited performance. This study developed a multimodal deep learning framework integrating whole-slide imaging (WSI) and structured clinical data to improve mutation prediction. METHODS: A total of 1,047 patients from TCGA and 166 patients from 3 external centers were included. The histopathology model used a transformer-based pretrained encoder (H-optimus-0) and a clustering-constrained attention multiple instance learning (CLAM-SB MIL) classifier to generate WSI-level representations. The clinical model incorporated engineered clinical variables and an extreme gradient boosting (XGBoost) model. A decision-level late fusion strategy (Multimodal PIK3CA Model, MPM) combined probabilistic outputs from both branches. Performance was evaluated with the area under the curve (AUC) and secondary metrics. Interpretability was assessed via attention heatmaps and shapley additive explanations (SHAP) analysis. RESULTS: MPM outperformed single-modality models. It achieved an AUC of 0.745 on TCGA and maintained stable performance across external cohorts (0.695, 0.690, and 0.680). SHAP analysis identified molecular subtype as the most influential clinical feature, whereas attention maps highlighted mutation-associated morphological regions. CONCLUSIONS: The developed multimodal framework effectively integrates complementary morphological and clinical information, and provides a robust and generalizable method for predicting PIK3CA mutation status. Strong multicenter adaptability and biological interpretability support its potential use as a clinical decision-support tool and an accessible alternative to molecular testing.