Abstract
BACKGROUND: Efficient community-based screening for individuals at high risk of mortality is a major public health challenge. While many predictors have been proposed, there is limited consensus on which factors are both robust and practical for population screening. This study applied interpretable machine learning to identify efficient predictors of all-cause and cardiovascular mortality in a nationally representative cohort. METHODS: We analyzed 9957 adults aged ≥40 years from NHANES 1999-2004 with linked mortality follow-up. A total of 134 demographic, lifestyle, and biomarker variables were evaluated across multiple algorithms. Model interpretability was assessed with Shapley Additive Explanations (SHAP), and the prognostic implications of leading predictors were examined with Kaplan-Meier analyses. RESULTS: Over 5 years, 1293 participants (13.0%) died. Across analytic approaches, age, troponin T (TNT), and N-terminal pro-B type natriuretic peptide (NT-proBNP) consistently emerged as the most influential predictors. Survival analyses demonstrated significantly poorer outcomes among individuals with elevated TNT and NT-proBNP. A parsimonious five-variable model (age, TNT, NT-proBNP, physical activity, gender) retained good discrimination (AUC = 0.841) and calibration. CONCLUSIONS: A parsimonious set of five predictors-age, gender, physical activity, TNT, and NT-proBNP-enabled efficient mortality risk stratification in NHANES, supporting their potential role in practical community screening.