Inferring high-fat dietary patterns from electronic health record data using machine learning

利用机器学习从电子健康记录数据中推断高脂肪饮食模式

阅读:2

Abstract

OBJECTIVES: Electronic health records (EHRs) rarely capture dietary detail, limiting diet-disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs. MATERIALS AND METHODS: We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels. RESULTS: Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results. DISCUSSION: These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet-disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility. CONCLUSION: A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。