Abstract
Administrative datasets are important for cirrhosis research but limited by suboptimal cirrhosis identification. We developed and validated algorithms to accurately identify cirrhosis and its complications in real-world, statewide dataset. From 2017 to 2020 Indiana Patient Care Network data, 15,636 records were grouped by combinations of code and lab criteria (group A: cirrhosis codes, B: FIB-4/APRI criteria, C: cirrhosis complication codes, D: code/lab for liver disease). Diagnoses were confirmed by chart review in 4.5% of 15,636 records. Positive predictive values (PPV) were calculated for various algorithms which were externally validated in hepatology clinic (n = 1,039) and emergency department-based cirrhosis cohorts (n = 2,124). Charts meeting criteria for group A and at least one other group (“AX”, e.g., ABC) had an overall PPV of 86%. Highest PPVs were seen in ACD and ABCD and confirmed during external validation: 88% and 97% (hepatology cohort), 79% and 93% (ED cohort). Without complication codes, ABD showed strong PPVs: 86%(internal), 92%(hepatology), 72%(ED). ICD-10-based definitions alone were suboptimal for complications: ascites (57%), hepatic encephalopathy (HE:55%). PPV for HE was improved with addition of medications but remained < 80%. Taken together, we provide algorithms to identify both compensated and decompensated cirrhosis in real-world data. Using the “AX” algorithm, we created the statewide Indiana Digital Cirrhosis Cohort to support future research across cirrhosis stages. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-39585-2.