Abstract
INTRODUCTION: Large-scale cohort studies exploring the etiology of obstructive jaundice (OJ) are scarce, with current serum-based diagnostic markers offering suboptimal performance. This study leverages the largest retrospective cohort of patients with OJ to date to investigate its disease spectrum and to develop a novel diagnostic system. METHODS: This study involves 2 retrospective observational cohorts. The biliary surgery cohort (BS cohort, n = 349) served for initial data exploration and external validation of machine learning (ML) models. The large general cohort (LG cohort, n = 5,726) enabled an in-depth analysis of etiologies and the determination of relevant diagnostic indicators, in addition to supporting ML model development. Interpretable ML techniques were used to derive insights from the models. RESULTS: The LG cohort highlighted a diverse disease spectrum of OJ, including cholangiocarcinoma (10.39% distal, 10.01% perihilar, and 5.59% intrahepatic), pancreatic adenocarcinoma (19.11%), and common bile duct stones (18.27%) as leading causes. Traditional serum markers such as carbohydrate antigen 19-9 and carcinoembryonic antigen lacked stand-alone diagnostic accuracy. Two ML-based models (collectively termed the ML of OJ based on common laboratory tests model) were developed: a classifier to differentiate benign from malignant causes (AUROC = 0.862) and a multiclass model to further stratify malignant and benign diseases (ACC = 0.777). Interpretable ML tools provided clarity on critical features, offering actionable insights and enhancing transparency in the decision-making process. DISCUSSION: This study elucidates the etiological spectrum of OJ, meanwhile providing a practical and interpretable ML-based diagnostic tool. By leveraging large-scale clinical data, our model provides a rapid and reliable primary assessment for patients with OJ, enabling clinicians to identify potential etiologies and guide further diagnostic workup.