Abstract
OBJECTIVE: This study aimed to identify risk factors for the occurrence and growth of thyroid nodules (TNs) in euthyroid individuals, and to develop a machine learning-assisted predictive model (least absolute shrinkage and selection operator (LASSO) Cox regression) for preliminary individualized risk stratification. METHODS: A nine-year retrospective cohort of 1140 participants with normal thyroid function (6444 data points) was analyzed. First, LASSO regression-a classic regularization algorithm in machine learning-was applied to screen key predictors from 17 candidate variables. The selected variables were then incorporated into multivariable Cox proportional hazards models to construct a LASSO-Cox predictive model. A nomogram was developed, and model performance was assessed using the area under the curve (AUC), bootstrap-corrected concordance index, calibration plots, and decision curve analysis (DCA). RESULTS: Machine learning-based LASSO regression screened 10 variables for TN occurrence (age, sex, systolic blood pressure (SBP), diastolic blood pressure (DBP), waist circumference (WC), body mass index (BMI), fasting plasma glucose (FPG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), metabolic syndrome (MetS)) and 10 variables for TN enlargement (age, sex, SBP, BMI, hemoglobin A1c (HbA1c), TC, triglycerides, HDL-C, uric acid, MetS). Cox analysis further confirmed that sex, SBP, DBP, WC, BMI, FPG, HDL-C, and MetS were predictors of TN occurrence, while sex, HbA1c, HDL-C, and MetS were associated with TN enlargement. The LASSO-Cox nomogram achieved AUCs of 0.65 for occurrence and 0.64 for enlargement, with bootstrap-corrected C-indices of 0.62 and 0.64. Calibration showed good agreement between predicted and observed risks, and DCA demonstrated consistent net benefit across threshold probabilities of 0.05-0.6. CONCLUSION: Sex, HDL-C, and MetS are key predictors of TN occurrence and enlargement. The machine learning-assisted LASSO-Cox nomogram, based on routine clinical indicators, shows moderate discrimination and good calibration. This preliminary tool may assist in population-level risk stratification; however, its modest discrimination limits its immediate clinical implementation.