Abstract
BACKGROUND: The presence of a micropapillary component is an independent adverse prognostic predictor for patients with lung adenocarcinoma. This study aimed to develop a pathomics model based on weakly supervised deep learning to accurately predict postoperative survival risk in patients with lung adenocarcinoma containing micropapillary components by analyzing whole slide images. METHODS: Hematoxylin and eosin (H&E) stained whole slide images from 202 patients with lung adenocarcinoma containing micropapillary components who underwent surgical resection were retrospectively collected. A weakly supervised learning framework was adopted, utilizing ResNet18 and vision transformer (ViT) architectures to extract morphological features at the tile level, which were then aggregated to the patient level via multi-instance learning. Least absolute shrinkage and selection operator (LASSO) regression was employed to screen key pathomic features and construct a pathomics prediction model, which was compared with traditional models based on clinicopathological factors. Ultimately, the optimal pathomic features and clinically independent risk factors were integrated to establish a combined predictive nomogram. RESULTS: Among the deep learning models, ResNet18 demonstrated superior feature extraction capability, with area under the curve (AUC) values of 0.934 and 0.653 in the training and test sets, respectively, outperforming the ViT model. Consequently, the ResNet18 model was selected, yielding 206 extracted features. Following LASSO regression, four key pathomic features were retained. A logistic regression (LR) model built upon these features achieved an AUC of 0.892 in the training set. Univariate and multivariate analyses of clinical variables identified eight significant features, including micropapillary and solid components, for modeling. The clinical feature model constructed using LR achieved AUCs of 0.872 and 0.698 in the training and test sets, respectively. The final combined clinicopathological nomogram significantly improved predictive performance in the training set (AUC =0.945), surpassing both the standalone pathomics and clinical models, and maintained reasonable discriminative ability in the test set (AUC =0.687). Visual analysis using Gradient-Weighted Class Activation Mapping (Grad-CAM) indicated that the model's decisions were focused on regions rich in micropapillary structures, enhancing interpretability. CONCLUSIONS: Weakly supervised deep learning methods can effectively identify digital pathological features associated with prognosis. A combined model integrating pathomic features with clinical information demonstrated stronger predictive power than models based solely on pathology or clinical data, offering a potential auxiliary tool for individualized postoperative risk stratification in patients with lung adenocarcinoma containing micropapillary components.