Abstract
BACKGROUND: Stroke is a major global health burden. While composite indices have shown promise in cardiovascular risk assessment, their value in stroke risk assessment requires further investigation. This study aimed to develop a novel Triglyceride-Glucose-Body Shape Index (TBSI) by integrating the triglyceride-glucose index with a body shape index, and to evaluate its association with stroke risk together with the atherogenic index of plasma (AIP), utilizing data from the National Health and Nutrition Examination Survey (NHANES). METHODS: A total of 43,448 participants from NHANES (2003-2018) were included. The independent associations of AIP and TBSI with stroke were assessed using weighted multivariable logistic regression, with non-linear relationships investigated by restricted cubic splines (RCS). Discriminative ability was validated through receiver operating characteristic (ROC) analysis. Subgroup analyses were conducted to explore effect modification, and sensitivity analyses tested model robustness. Mediation analysis was conducted to explore whether the effects of AIP and TBSI on stroke were mediated by metabolic diseases (hypertension and diabetes) as well as oxidative stress biomarkers (gamma-glutamyl transferase [GGT] and albumin). The Boruta algorithm and multiple machine learning models were applied for feature selection. The performance of the best model was further interpreted using Shapley Additive Explanations (SHAP) values. RESULTS: Two indices were significantly associated with increased stroke risk. TBSI exhibited a J-shaped association, with a threshold at 7.176. Sex and race modified the associations. In terms of discriminative performance, TBSI (AUC = 0.779) proved superior to AIP (AUC = 0.769). Mediation analysis revealed statistically significant indirect effects: the association of TBSI with stroke was mediated through hypertension and diabetes, while that of AIP was mediated through hypertension. Although mediation through oxidative stress biomarkers was proposed, it did not reach statistical significance. Feature importance analysis identified TBSI as the top feature, with the random forest (RF) model demonstrating the highest accuracy in identifying stroke risk. SHAP analysis of the RF model results also showed that TBSI, AIP, and lipid-related features are key factors influencing stroke occurrence, with significant positive impacts. CONCLUSION: The novel TBSI and the AIP are strongly associated with stroke risk and offer complementary clinical value. Their integration into clinical screening and machine learning models may support early identification and targeted prevention, especially in resource-limited settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12872-026-05668-1.