Abstract
This systematic literature review investigates the integration of deep learning (DL), vision-language models (VLMs), and multiagent systems in the analysis of pathology images and automated report generation. The rapid advancement of whole-slide imaging (WSI) technologies has posed new challenges in pathology, especially due to the scale and complexity of the data. DL techniques in general and convolutional neural networks and transformers in particular have substantially enhanced image analysis tasks including segmentation, classification, and detection. However, these models often lack generalizability to generate coherent, clinically relevant text, thus necessitating the integration of VLMs and large language models (LLMs). This review examines the effectiveness of VLMs and LLMs in bridging the gap between visual data and clinical text, focusing on their potential for automating the generation of pathology reports. Additionally, multiagent systems, which leverage specialized artificial intelligence (AI) agents to collaboratively perform diagnostic tasks, are explored for their contributions to improving diagnostic accuracy and scalability. Through a synthesis of recent studies, this review highlights the successes, challenges, and future directions of these AI technologies in pathology diagnostics, offering a comprehensive foundation for the development of integrated, AI-driven diagnostic workflows.