Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods

使用机器学习方法评估伽马通过率预测的数据集质量

阅读:2

Abstract

OBJECTIVE: Gamma passing rate (GPR) predictions using machine learning methods have been explored for treatment verification of radiotherapy plans. However, these methods presented datasets with unbalanced number of plans having different treatment conditions (heterogeneous datasets), such as anatomical sites or dose per fractions, leading to lower model interpretability and prediction performance. METHODS: We investigated the impact of the dataset composition on GPR binary classification (pass/fail) using random forest (RF), XG-boost, and neural network (NN) models. 945 plans were used to create one reference dataset (randomly assembled) and 24 customized datasets that considered four heterogeneity factors independently (anatomical region, number of arcs, dose per fraction, and treatment unit). 309 predictor features were extracted and calculated from plan parameters, modulation complexity metrics, and radiomic analysis (leave-trajectory maps, 3D dose distributions, and portal dosimetry images). The models' performances were measured using the area under the curve from the receiver operating characteristic (ROC-AUC). RESULTS: Radiomics features for reference models increased ROC-AUC values up to 13%, 15%, and 5% for RF, XG-Boost, and NN, respectively. The datasets with higher heterogeneous conditions presented the lower ROC-AUC values (RF: 0.72 ± 0.11, XG-Boost: 0.67 ± 0.1, NN: 0.89 ± 0.05) compared to models with less heterogeneous treatment conditions (RF: 0.88 ± 0.06, XG-Boost: 0.89 ± 0.07, NN: 0.98 ± 0.01). The ten most important features for each heterogeneity dataset group demonstrated their correlation with the treatments' physical aspects and GPR prediction. CONCLUSION: Improvements in data generalization and model performances can be associated with datasets having similar treatment conditions. This analysis might be implemented to evaluate the dataset quality and model consistency of further ML applications in radiotherapy. ADVANCES IN KNOWLEDGE: Dataset heterogeneities decrease ML model performance and reliability.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。