Abstract
BACKGROUND: Interstitial fibrosis (IF) is the strongest predictor of chronic kidney disease progression. Visual estimation of IF from trichrome (TRI)-stained slides has high interobserver variability and limited reproducibility. METHODS: We developed and validated TRI_IF, a deep learning algorithm that mimics a nephropathologist's workflow to quantify IF from TRI-stained whole-slide images (WSIs) without requiring manual annotations. TRI_IF is a hybrid model that combines deep learning and morphometric analysis. Using deep learning, TRI_IF predicts an image specific blue to red pixel intensity threshold to quantify fibrosis. The model was trained on WSIs from 315 patients and corresponding clinical data from centers participating in the NEPTUNE digital pathology repository, spanning 14 years (2009-2023). The mean (±standard deviation) age at enrollment was 32 (±21) years, and 44% of participants were female. Ground-truth IF estimates were derived from consensus scores provided by expert nephropathologists. The model used the Xception architecture for feature extraction and an XGBoost regressor to predict image-specific thresholds for fibrosis quantification. Model performance was evaluated against pathologist's derived consensus IF score using agreement metrics and its ability to predict clinical outcomes of end-stage kidney disease (ESKD) or 40% decline in estimated glomerular filtration rate (eGFR). RESULTS: TRI_IF demonstrated strong agreement with pathologist-derived IF estimates at both the biopsy core and patient levels. The R (2) values for the validation set was 0.86 and increased to 0.93 on the subset of validation cores with good image quality and the mean difference (predicted-ground-truth IF estimate) was always <3% (although statistically significant). Pitman's test p values were >0.05 across all subsets. Similarly, the weighted Cohen's kappa between Banff categories of the TRI_IF predicted and ground-truth estimates was 0.86 in the full validation set and 0.91 in the validation subset with acceptable slide quality. The model accurately categorized patients into Banff IF classes and predicted adverse clinical outcomes of time to ESKD or ≥40% decline in eGFR. These associations remained robust in sensitivity analyses restricted to validation subsets and high-quality slides. CONCLUSION: TRI_IF provides an accurate, reproducible, and clinically meaningful method for quantifying IF from TRI-stained WSIs. By eliminating the need for manual annotations and reducing interobserver variability, this approach offers a scalable solution for both clinical and research applications in nephropathology.