Abstract
BACKGROUND: Health administrative data are widely used in health services research, but diagnostic codes may be subject to misclassification. This scoping review examined the validity and reporting practices of operational definitions using the Korean National Health Insurance Claim Database (KHICD), focusing on compliance with the REporting of studies Conducted using Observational Routinely collected health Data (RECORD) statement. METHODS: PubMed was searched for studies validating operational definitions or using the KHICD for population identification, published between 2020 and 2024. After screening 29 validation studies and 239 KHICD-based studies, 12 validation studies and 157 KHICD-based studies were included. Data on operational definitions, validation methods, and adherence to RECORD guidelines were extracted. RESULTS: Among 12 validation studies, most focused on cancer, and algorithms combining diagnostic codes with rare intractable disease program codes, prescription, or procedure codes demonstrated higher positive predictive values than diagnostic codes alone. Among the 157 KHICD-based studies, 131 (83.4%) used diagnosis codes to identify patients, of which 71 (45.2%) combined them with supplementary codes. However, 60 (38.2%) studies relied solely on diagnosis codes, often without specifying diagnosis scope or acknowledging misclassification risks. Only three studies conducted validation, and overall compliance with the RECORD statement was limited. CONCLUSION: While operational definitions using combined codes improve patient identification in the KHICD, methodological rigor and reporting transparency remain suboptimal. Systematic validation studies and strict adherence to reporting guidelines are needed to enhance reproducibility and comparability of KHICD-based research.