Machine learning models for the prediction of uterine fibroids

用于预测子宫肌瘤的机器学习模型

阅读:2

Abstract

In this cross-sectional study, we developed and validated a predictive model for uterine fibroid risk using routine physical examination indicators and 5 machine learning algorithms: logistic regression, random forest, k-nearest neighbors, categorical boosting (CatBoost), and light gradient boosting machine. The primary dataset consisted of health examination records from the MJ Health Screening Center in Beijing, China (2013-2023), while an independent external validation dataset (2024) was used to assess generalizability. LASSO regression identified 13 significant predictors, including age, body mass index, total cholesterol, diastolic blood pressure, and marital status. Among the models, CatBoost demonstrated the best performance, achieving an area under the curve of 0.808 in the internal validation dataset and 0.821 in the external validation dataset, indicating strong predictive capability and robustness. SHapley additive exPlanations analysis revealed that age and body mass index were the most critical predictors, and that total cholesterol was a key predictive feature; its implications for lipid metabolism are further discussed in the main text. Despite its strengths in area under the curve, specificity, and sensitivity, the model exhibited limitations in precision (0.475) and moderate accuracy (0.742), indicating challenges in controlling false-positive rates. The results indicate that the model is a potentially effective screening tool for identifying high-risk individuals who may benefit from further diagnostic evaluation. While this study validates the feasibility of using routine health examination data combined with the CatBoost algorithm for early risk assessment of uterine fibroids, it also highlights the need for cautious interpretation of the model's predictions in clinical practice. Future research should focus on multicenter, large-scale studies to enhance the model's generalizability and incorporate additional predictive factors to optimize performance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。