Identification of cachexia in lung cancer patients with an ensemble learning approach

利用集成学习方法识别肺癌患者的恶病质

阅读:1

Abstract

OBJECTIVE: Nutritional intervention prior to the occurrence of cachexia will significantly improve the survival rate of lung cancer patients. This study aimed to establish an ensemble learning model based on anthropometry and blood indicators without information on body weight loss to identify the risk factors of cachexia for early administration of nutritional support and for preventing the occurrence of cachexia in lung cancer patients. METHODS: This multicenter study included 4,712 lung cancer patients. The least absolute shrinkage and selection operator (LASSO) method was used to obtain the key indexes. The characteristics excluded weight loss information, and the study data were randomly divided into a training set (70%) and a test set (30%). The training set was used to select the optimal model among 18 models and verify the model performance. A total of 18 machine learning models were evaluated to predict the occurrence of cachexia, and their performance was determined using area under the curve (AUC), accuracy, precision, recall, F1 score, and Matthews correlation coefficient (MCC). RESULTS: Among 4,712 patients, 1,392 (29.5%) patients were diagnosed with cachexia based on the framework of Fearon et al. A 17-variable gradient boosting classifier (GBC) model including body mass index (BMI), feeding situation, tumor stage, neutrophil-to-lymphocyte ratio (NLR), and some gastrointestinal symptoms was selected among the 18 machine learning models. The GBC model showed good performance in predicting cachexia in the training set (AUC = 0.854, accuracy = 0.819, precision = 0.771, recall = 0.574, F1 score = 0.658, MCC = 0.549, and kappa = 0.538). The abovementioned indicator values were also confirmed in the test set (AUC = 0.859, accuracy = 0.818, precision = 0.801, recall = 0.550, F1 score = 0.652, and MCC = 0.552, and kappa = 0.535). The learning curve, decision boundary, precision recall (PR) curve, the receiver operating curve (ROC), the classification report, and the confusion matrix in the test sets demonstrated good performance. The feature importance diagram showed the contribution of each feature to the model. CONCLUSIONS: The GBC model established in this study could facilitate the identification of cancer cachexia in lung cancer patients without weight loss information, which would guide early implementation of nutritional interventions to decrease the occurrence of cachexia and improve the overall survival (OS).

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。