Abstract
Background: Insulin resistance is defined as reduced tissue responsiveness to insulin-mediated glucose actions. Gold standard methods like hyperinsulinemic-euglycemic clamp and hyperglycemic clamps are costly and rarely used in large epidemiological studies. The aim was to evaluate the best performing machine learning algorithm for insulin resistance predictions in Brazilian adolescents. Methods: We used data from 37,454 Brazilian adolescents from 12 to 17 years, sampled from the Study of Cardiovascular Risk Factors in Adolescents (2013-2014). Covariates included other cardiovascular risk factors. We evaluate seven machine learning models stratifying the subset by sex. The performance of the models was assessed by area under the curve (AUC), calibration curves and decision curve analysis (DCA). Finally, we adopted the SHAP approach to assess the importance of each variable to the best performing ML model. Results: The Logistic Regression model presented the best AUC value (AUC = 0.8 for boys and girls). The best performing ML models had higher calibration in girls than in boys. The DCA curves showed prevalence of almost equal values for girls and for boys. The most important clinical predictors for both sexes were waist circumference, triglycerides and age. Conclusions: Logistic Regression proved to be the best clinical prediction model comparable to complex models. Further studies are needed in more diverse populations.