Abstract
Quantifying tissue, molecular, and structural integrity is essential for biobank development. However, current assessment methods either involve destructive testing that depletes valuable biospecimens or rely on manual evaluations, which are not scalable and lead to interindividual variation. To overcome these challenges, we present PathQC, a deep-learning framework that directly predicts the tissue RNA Integrity Number (RIN) and the extent of autolysis from hematoxylin and eosin (H & E)-stained whole-slide images of normal tissue biopsies. Advancing over prior QC methods focused on imaging quality control, PathQC provides sample-quality control through the direct quantification of molecular integrity (RIN) and structural degradation (autolysis). PathQC first extracts morphological features from the slide using a recently developed digital pathology foundation model (UNI), followed by a supervised model that learns to predict RNA Integrity Number and autolysis scores from these morphological features. PathQC is trained on and applied to the Genotype-Tissue Expression (GTEx) cohort, which comprises 25,306 non-diseased post-mortem samples across 29 tissues from 970 donors, when paired ground-truth RIN and autolysis scores were available. Here, PathQC predicted RIN with an average Pearson correlation of 0.47 and an autolysis score of 0.45, with notably high performance using adrenal gland tissue (R = 0.82) for RIN and colon tissue (R = 0.83) for autolysis. We provide a pan-tissue model for predicting RIN and autolysis scores for new slides from any tissue type (GitHub). Overall, PathQC enables a scalable assessment of tissue molecular and structural integrity from routine H & E images, enhancing biobank quality control and retrospective analyses across 29 tissues and multiple collection sites.