Abstract
OBJECTIVE: To refine prognostic stratification for intermediate-risk acute myeloid leukemia (IR-AML) by leveraging machine learning to integrate clinical and genomic features and relate them to survival outcomes. METHODS: We conducted a two-cohort study comprising a single-center development cohort from Beijing Tsinghua Changgung Hospital (n = 56) and an independent external cohort from The Cancer Genome Atlas (TCGA; n = 79). Demographics and mutational profiles were analyzed alongside survival outcomes. We developed three tree-based models-random forests, gradient-boosted decision trees (GBDT), and XGBoost-on the Tsinghua Changgung cohort, using stratified five-fold cross-validation for internal validation. RESULTS: In internal cross-validation, tree-based learners showed strong discrimination (best GBDT AUROC 0.98, 95% confidence interval (CI) 0.91-1.00). On the external TCGA cohort, GBDT achieved AUROC 0.73 (95% CI 0.62-0.83). Model-agnostic explanations (Shapley additive explanations) consistently highlighted white blood cell count, age, transplantation, and TET2 among top contributors. CONCLUSION: An interpretable machine learning framework built from accessible clinical and genomic variables provided quantitative risk discrimination for IR-AML across development and external test cohorts, supporting individualized risk assessment and informing refinement of prognostic stratification.