Abstract
ObjectiveTo examine how artificial intelligence and machine learning methods are used to develop chronic disease case definitions using primary care electronic medical records and to evaluate their methodological transparency and clinical applicability.MethodsA scoping review was conducted according to the Arksey and O'Malley framework and Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines across Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (EMBASE), Cumulative Index to Nursing and Allied Health Literature (CINAHL), Scopus, and Web of Science (2000-2024). Eligible studies applied machine learning to define or validate chronic disease cases using primary care electronic medical record data. Twenty-three studies were analyzed for data sources, machine learning approaches, validation strategies, and knowledge translation activities.ResultsMost studies were from Canada or the United Kingdom and used supervised machine learning, primarily random forests and neural networks. Validation metrics commonly included sensitivity and specificity, although external validation was rare. Interpretable, rules-based definitions could be derived from approximately 50% of the studies, while none described formal knowledge translation.ConclusionsMachine learning methods can improve electronic medical records-based disease identification and surveillance. However, greater transparency, validation, and clinician-machine learning collaboration are essential to translate these approaches into trustworthy, decision-support tools for primary care.INPLASY registration number: 2025120002.