Development of an interpretable machine learning model to predict complete remission and first adverse event in pediatric acute myeloid leukemia using routine clinical data

利用常规临床数据,开发可解释的机器学习模型,以预测儿童急性髓系白血病患者的完全缓解和首次不良事件。

阅读:2

Abstract

BACKGROUND: Pediatric acute myeloid leukemia (pAML) is a rapidly progressive myeloid malignancy characterized by malignant clonal expansion of hematopoietic stem and progenitor cells. The prediction of complete remission (CR) and first adverse event (AE) is critical for personalizing pAML treatment; however, interpretable machine learning (ML) models that utilize only routine clinical features for this purpose are lacking. METHODS: A total of 206 de novo pediatric AML patients (excluding acute promyelocytic leukemia) were randomly split into training (80%) and test (20%) sets. Seven supervised ML algorithms were constructed for predicting CR and AE, and their performance was evaluated by accuracy, specificity, F1-score, and area under the receiver operating characteristic curve (AUC). Model interpretability was assessed using feature importance, accumulated local effect (ALE) plots, and SHAP values. RESULTS: To identify optimal predictors of CR, three feature selection methods-random forest, stepwise regression, and joint mutual information maximization (JMIM)-were employed. Their intersection revealed seven key features: age, bone marrow blasts, peripheral blood blasts, platelet count (PLT), t (8;21), TP53 and del7/del7q. The random forest model demonstrated optimal performance, with a training AUC of 0.90 (95% CI: 0.86-0.97) and a test AUC of 0.79 (95% CI: 0.72-0.86). The similar machine learning pipeline was applied to predict the first adverse event (AE). Nine features were selected as optimal predictors through the intersection of the same three algorithms: white blood cell count (WBC), peripheral blood blasts, PLT, bone marrow blasts, hemoglobin, age, t (8;21), NPM1 and KIT. For AE prediction, the random forest algorithm also exhibited optimal performance, with a training AUC of 0.92 (95% CI: 0.85-0.97) and a test AUC of 0.78 (95% CI: 0.66-0.84). Interpretability analysis of the random forest models revealed that a higher platelet count at diagnosis was predictive of an increased probability of CR and a reduced risk of AE. In contrast, elevated WBC and peripheral blood blast percentage were associated with a higher incidence of AE. CONCLUSION: Our random forest model, built on routine hematological parameters, demonstrated strong potential for predicting CR and AE in pAML, thereby facilitating early risk stratification and guiding personalized treatment strategies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。