Abstract
Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment.