Abstract
Cancer research emphasises early detection, yet quantitative methods for normal tissue analysis remain limited. Digitised haematoxylin and eosin (H&E)-stained slides enable computational histopathology, but artificial intelligence (AI)-based analysis of normal breast tissue (NBT) in whole slide images (WSIs) remains scarce. We curated 70 WSIs of NBTs from multiple sources and cohorts with pathologist-guided manual annotations of epithelium, stroma, and adipocytes ( https://github.com/cancerbioinformatics/OASIS ). We developed robust convolutional neural network (CNN)-based, patch-level classification models, named NBT-Classifiers, to tessellate and classify NBTs at different scales. Across three external cohorts, NBT-Classifiers trained on 128 × 128 µm and 256 × 256 µm patches achieved AUCs of 0.98-1.00. The model learned independent normal features different from those of precancerous and cancerous epithelium, which were further visualised using two explainable AI techniques. When integrated into an end-to-end preprocessing pipeline, NBT-Classifiers facilitate efficient downstream analysis within peri-lobular regions. NBT-Classifiers provide robust compartment-specific analytical tools and enhance our understanding of NBT appearances, which serve as valuable reference points for identifying premalignant changes and guiding early breast cancer prevention strategies.