Abstract
Gastric adenocarcinoma is a leading cause of cancer related mortality worldwide, and histopathologic examination of endoscopic biopsy samples remains essential for its diagnosis and grading. In this study, we propose a novel AI based caption generation model, termed MIAC (Multi-instance Attention Captioning), designed to produce descriptive diagnostic reports from digital pathology images. The model leverages a Multi-instance learning framework with permutation-invariant self attention to aggregate features from multiple histopathology image patches into a unified representation, effectively capturing whole slide characteristics. Using the publicly available PatchGastricADC22 dataset for training and validation, and an External Test dataset from Gil Hospital of Gachon University for clinical testing, the model demonstrated strong performance across standard natural language generation metrics (BLEU@4, ROUGE-L, METEOR, CIDEr). Notably, MIAC maintained high captioning accuracy even when evaluated on previously unseen data, particularly after color normalization using the Macenko method. These results underscore the model’s robustness, generalizability, and potential for integration into routine digital pathology workflows to assist pathologists in generating structured diagnostic reports.