Abstract
Healthcare-integrated biobanking (HIB) describes the collection of surplus samples from clinical routine and requires tailored algorithms for identification of adequate samples. However, identifying patients with specific conditions like chronic kidney disease (CKD) from heterogeneous real-world data remains challenging. This study develops and validates two HIB-specific algorithms for automated CKD identification based on electronic health records (EHR), enabling targeted sample collection and retrospective cohort assembly. Two logistic regression-based CKD algorithms were developed in an existing training cohort (n = 785) with high prevalence of CKD (48 %): the admissionHIB algorithm (based on laboratory values at admission) and the historyHIB algorithm (including additional previous hospital stays). The validation was carried out on patients of Jena University Hospital who gave informed consent to the Broad Consent of the Medical Informatics Initiative (MII) and were admitted between 01/2018 and 04/2020. The validation cohort was divided into a gold-standard cohort (n = 162) defined by manual chart review and a larger silver-standard cohort (n = 1075) generated using a validated algorithm from prior studies. The admissionHIB and historyHIB algorithms achieved F1-scores of 86 % and 91 %, respectively, in the training cohort. The validation cohort had a lower prevalence of CKD (approximately 12 %). Several automated review algorithms were evaluated in the gold-standard cohort, with the best-performing model (93 % recall, precision, and F1-score; 97 % accuracy) selected to generate the silver-standard cohort. Both HIB algorithms yielded F1-scores of 80 % (admissionHIB) and 78 % (historyHIB) in the gold-standard cohort, and 83 % and 80 %, respectively, in the silver-standard cohort. These findings demonstrate good performance of HIB-specific CKD algorithms across heterogeneous patient populations, establishing a reproducible framework combining real-world EHR data, patient consent infrastructure, and silver-standard validation.