Abstract
OBJECTIVE: Quantitative analysis of tumor-infiltrating lymphocytes (TILs) is crucial in computational pathology studies of lung adenocarcinoma. However, acquiring large-scale, fully annotated datasets remains a major obstacle for the supervised learning approaches that currently dominate high-precision modeling. To address this data bottleneck, we developed a fully automated pipeline for the precise annotation of tissue contours, tumor parenchyma, and lymphocytes in whole-slide images (WSIs). METHODS: This study utilized WSI data from The Cancer Genome Atlas (TCGA) cohort, with comprehensive manual annotations performed by two pathologists using QuPath software, with all annotations subsequently reviewed by a third senior pathologist. The resulting training dataset comprised over 20,000 annotated units. These annotated data were used to train three core modules consisting of an OpenCV-based image processing pipeline for tissue contour detection, a lightweight U(2)-NetP model for tumor parenchyma segmentation, and a YOLOv7 object detection framework for TILs identification within stromal regions. The pipeline was rigorously validated on both an independent internal cohort and an external hospital cohort, and its outputs were benchmarked against semi-quantitative assessments from expert pathologists. RESULTS: The pipeline demonstrated robust and generalizable performance. For tissue contour detection, the OpenCV-based pipeline achieved a Dice coefficient of 90.90% on the test set. For the core learning-based tasks, the tumor parenchyma segmentation model achieved a Dice coefficient of 87.17% on the internal test set and maintained consistent accuracy on the external cohort, with Dice coefficients ranging from 0.8509 to 0.9178. In the particularly challenging task of lymphocyte detection, the YOLOv7-based model attained an F1-score of 78.84% and mAP@0.5 of 81.16% on the test set, with performance sustained on external data. Critically, the automated TILs quantifications showed excellent agreement with independent pathologist assessments (ICC >0.96). The implementation of optimized lightweight architectures enables the pipeline to serve as an accessible solution for large-scale WSIs analysis in computational pathology. CONCLUSION: This study has successfully developed a fully automated annotation pipeline for lung adenocarcinoma WSIs. By generating high-quality annotations of stromal TILs, this pipeline establishes a reliable data foundation for subsequent computational pathology research and facilitates the advancement of artificial intelligence applications in pathology.