Abstract
Epithelial ovarian cancer (EOC) exhibits significant heterogeneity in clinical outcomes, influenced by histology, age, stage, and molecular characteristics. This study aimed to develop and validate a comprehensive model integrating demographic, clinical, and molecular data from The Cancer Genome Atlas (TCGA) to predict two-year survival outcomes in EOC. The cohort included 2,427 patients with Endometrioid Adenocarcinoma (EA)s and Serous Cystadenocarcinoma (SC) , of whom 1,011 had gene data. Machine learning models, including Logistic Regression, Gradient Boosting Classifier (GBC), Support Vector Machines (SVM), and Random Forest, were trained and evaluated for predictive performance. SVM provided the optimal balance of mortality-class detection and overall performance. While GBC achieved the highest ROC-AUC (0.81), SVM demonstrated superior recall for mortality cases (0.70 vs. 0.61), which was prioritized given our clinical objective. Shapley Additive Explanations (SHAP) analyses revealed that WT1, HOXA11, TPM4, TMPRSS2, MUC16, SDHD, and MYC were the most influential predictors of mortality, along with age at diagnosis. Differential gene expression and enrichment analyses identified distinct age- and stage-associated molecular profiles, with genes involved in cell cycle regulation, tumor microenvironment, and growth factor signaling showing significant upregulation. Mutational analyses revealed histology-specific patterns, with TP53, PIK3CA, and ZFHX3 highly mutated in SC, while PTEN and ARID1A were more prevalent in EA. Several mutations, including TP53, FAT3, and FAT4 in EA, and CSMD3 in SC, were associated with poorer survival. Integrating multivariate predictive modeling with biological interpretation provides a comprehensive framework for personalized risk stratification and treatment decision-making in EOC. The identified prognostic biomarkers, such as TPM4, SDHD, MUC16, and BCL6, represent potential targets for future studies and therapeutic interventions.