Machine learning prediction of thyroid cancer recurrence for early screening and clinical decision pathways: a retrospective cohort study

利用机器学习预测甲状腺癌复发以指导早期筛查和临床决策:一项回顾性队列研究

阅读:2

Abstract

Recurrence prediction in differentiated thyroid carcinoma (DTC) remains clinically challenging despite generally favorable outcomes and well-established treatment strategies. Improving early identification of patients at elevated recurrence risk may enhance individualized surveillance and therapeutic decision-making. This study evaluated the performance and clinical utility of machine-learning models for recurrence prediction using routinely collected clinicopathologic features from a publicly available cohort of 383 patients with long-term follow-up.16 variables were initially analyzed following international guideline definitions. Random Forest, XGBoost, and LightGBM models were developed using stratified training-test splits, SMOTE for class-imbalance correction, fivefold cross-validation, probability calibration, and decision-curve analysis. Shapley Additive Explanations (SHAP) was applied to quantify global and local feature contributions and to derive simplified feature subsets. Models trained with 4, 6, 8, and full feature sets were compared to assess the impact of dimensionality reduction on discrimination and interpretability. Full-feature models achieved strong performance, with Random Forest obtaining the highest AUC (0.931). Notably, a compact 4-feature Random Forest model-including Risk, N stage, T stage, and Age-maintained high discriminatory ability (AUC 0.913; accuracy 0.862; recall 0.750), demonstrating that substantial simplification preserved predictive value. Performance improvements plateaued beyond 6-8 features, indicating limited incremental benefit from larger feature sets. SHAP analysis consistently identified Risk, N, T, and Age as dominant predictors. These findings highlight that streamlined, interpretable ML models using a small number of clinically accessible features can provide accurate and explainable recurrence prediction in DTC. Such models offer advantages in computational efficiency, transparency, and real-world deployability, supporting their potential integration into electronic health record systems or point-of-care decision tools. Future work should prioritize multicenter external validation and incorporation of additional pathological or molecular markers to enhance generalizability and clinical applicability.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。