Inferring high-fat dietary patterns from electronic health record data using machine learning

利用机器学习从电子健康记录数据中推断高脂肪饮食模式

阅读：2

作者：Yeh,Ya-Yun,Lin,Hsin-Yueh,Guo,Jingchuan,Sun,Ramon C,Jiang,Sizun,Bian,Jiang,Dai,Hao

期刊：	JAMIA Open	影响因子：	3.400
时间：	2026	起止号：	2026 Feb;9(1):ooaf181
doi：	10.1093/jamiaopen/ooaf181

Abstract

OBJECTIVES: Electronic health records (EHRs) rarely capture dietary detail, limiting diet-disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs. MATERIALS AND METHODS: We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels. RESULTS: Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results. DISCUSSION: These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet-disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility. CONCLUSION: A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

凋亡

线粒体

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

单细胞

中性粒细胞

3D/类器官

外泌体

DNA甲基化

药物研究

miRNA

细胞衰老

铁死亡

乙酰化

缺氧低氧

泛素化

组蛋白修饰

炎性小体

树突状细胞

肿瘤微环境

代谢重编程

焦亡

lncRNA

m6A/m5C/m7G

空间多组学

内质网应激

细胞基因治疗

治疗耐药

相分离

Treg

免疫代谢

上皮间质转化

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

铁代谢

细胞极性

cGAS-STING

氨基酸代谢

肠脑轴

乳酸化

碱基编辑

蛋白降解

circRNA

肿瘤异质性

翻译调控

piRNA

NK 细胞

低氧缺氧

氧化脂质

MDSC

NETosis

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

RNA 编辑

冷应激

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

MAIT 细胞

丙酰化