Abstract
BACKGROUND: Pediatric acute myeloid leukemia (pAML) is a rapidly progressive myeloid malignancy characterized by malignant clonal expansion of hematopoietic stem and progenitor cells. The prediction of complete remission (CR) and first adverse event (AE) is critical for personalizing pAML treatment; however, interpretable machine learning (ML) models that utilize only routine clinical features for this purpose are lacking. METHODS: A total of 206 de novo pediatric AML patients (excluding acute promyelocytic leukemia) were randomly split into training (80%) and test (20%) sets. Seven supervised ML algorithms were constructed for predicting CR and AE, and their performance was evaluated by accuracy, specificity, F1-score, and area under the receiver operating characteristic curve (AUC). Model interpretability was assessed using feature importance, accumulated local effect (ALE) plots, and SHAP values. RESULTS: To identify optimal predictors of CR, three feature selection methods-random forest, stepwise regression, and joint mutual information maximization (JMIM)-were employed. Their intersection revealed seven key features: age, bone marrow blasts, peripheral blood blasts, platelet count (PLT), t (8;21), TP53 and del7/del7q. The random forest model demonstrated optimal performance, with a training AUC of 0.90 (95% CI: 0.86-0.97) and a test AUC of 0.79 (95% CI: 0.72-0.86). The similar machine learning pipeline was applied to predict the first adverse event (AE). Nine features were selected as optimal predictors through the intersection of the same three algorithms: white blood cell count (WBC), peripheral blood blasts, PLT, bone marrow blasts, hemoglobin, age, t (8;21), NPM1 and KIT. For AE prediction, the random forest algorithm also exhibited optimal performance, with a training AUC of 0.92 (95% CI: 0.85-0.97) and a test AUC of 0.78 (95% CI: 0.66-0.84). Interpretability analysis of the random forest models revealed that a higher platelet count at diagnosis was predictive of an increased probability of CR and a reduced risk of AE. In contrast, elevated WBC and peripheral blood blast percentage were associated with a higher incidence of AE. CONCLUSION: Our random forest model, built on routine hematological parameters, demonstrated strong potential for predicting CR and AE in pAML, thereby facilitating early risk stratification and guiding personalized treatment strategies.