Abstract
BACKGROUND: Self-rated health is a strong predictor of mortality and morbidity. Machine learning techniques may provide insights into which of the multifaceted contributors to self-rated health are key drivers in diverse groups. OBJECTIVE: We used machine learning algorithms to predict self-rated health in diverse groups in the Behavioral Risk Factor Surveillance System (BRFSS), to understand how machine learning algorithms might be used explicitly to examine drivers of self-rated health in diverse populations. DESIGN: We applied three common machine learning algorithms to predict self-rated health in the 2017 BRFSS survey, stratified by age, race/ethnicity, and sex. We replicated our process in the 2016 BRFSS survey. PARTICIPANTS: We analyzed data from 449,492 adult participants of the 2017 BRFSS survey. MAIN MEASURES: We examined area under the curve (AUC) statistics to examine model fit within each group. We used traditional logistic regression to predict self-rated health associated with features identified by machine learning models. KEY RESULTS: Each algorithm, regularized logistic regression (AUC: 0.81), random forest (AUC: 0.80), and support vector machine (AUC: 0.81), provided good model fit in the BRFSS. Predictors of self-rated health were similar by sex and race/ethnicity but differed by age. Socioeconomic features were prominent predictors of self-rated health in mid-life age groups. Income [OR: 1.70 (95% CI: 1.62-1.80)], education [OR: 2.02 (95% CI: 1.89, 2.16)], physical activity [OR: 1.52 (95% CI: 1.46-1.58)], depression [OR: 0.66 (95% CI: 0.63-0.68)], difficulty concentrating [OR: 0.62 (95% CI: 0.58-0.66)], and hypertension [OR: 0.59 (95% CI: 0.57-0.61)] all predicted the odds of excellent or very good self-rated health. CONCLUSIONS: Our analysis of BRFSS data show social determinants of health are prominent predictors of self-rated health in mid-life. Our work may demonstrate promising practices for using machine learning to advance health equity.