Abstract
BACKGROUND: Accurate preoperative T and TNM staging of clear cell renal cell carcinoma (ccRCC) is crucial for diagnosis and treatment, but these assessments often depend on subjective radiologist judgment, leading to interobserver variability. This study aims to design and validate two CT-based deep learning models and evaluate their clinical utility for the preoperative T and TNM staging of ccRCC. METHODS: Data from 1,148 ccRCC patients across five medical centers were retrospectively collected. Specifically, data from two centers were merged and randomly divided into a training set (80%) and a testing (20%) set. Data from two additional centers comprised external validation set 1, and data from the remaining independent center comprised external validation set 2. Two 3D deep learning models based on a Transformer-ResNet (TR-Net) architecture were developed to predict T staging (T1, T2, T3 + T4) and TNM staging (I, II, III, IV) using corticomedullary phase CT images. Gradient-weighted Class Activation Mapping (Grad-CAM) was used to generate heatmaps for improved model interpretability, and a human-machine collaboration experiment was conducted to evaluate clinical utility. Models' performance was evaluated using micro-average AUC (micro-AUC), macro-average AUC (macro-AUC), and accuracy (ACC). RESULTS: Across the two external validation sets, the T staging model achieved micro-AUCs of 0.939 and 0.954, macro-AUCs of 0.857 and 0.894, and ACCs of 0.843 and 0.869, while the TNM staging model achieved micro-AUCs of 0.935 and 0.924, macro-AUCs of 0.817 and 0.888, and ACCs of 0.856 and 0.807. While the models demonstrated acceptable overall performance in preoperative ccRCC staging, performance was moderate for advanced subclasses (T3 + T4 AUC: 0.769 and 0.795; TNM III AUC: 0.669 and 0.801). Grad-CAM heatmaps highlighted key tumor regions, improving interpretability. The human-machine collaboration demonstrated improved diagnostic accuracy with model assistance. CONCLUSION: The CT-based 3D TR-Net models showed acceptable overall performance with moderate results in advanced subclasses in preoperative ccRCC staging, with interpretable outputs and collaborative benefits, making them potentially useful decision-support tools.