Abstract
Accurate differentiation of malignant from benign pulmonary nodules remains challenging. This study aimed to develop and validate machine learning models integrating seven autoantibodies (7-AABs) and routine laboratory parameters for lung cancer risk prediction. We retrospectively enrolled 310 patients with pulmonary nodules (142 early-stage malignant, 168 benign). LASSO regression was used for feature selection. Eleven machine learning algorithms were developed and validated. Model performance was assessed by AUC, calibration, and decision curve analysis. SHAP was applied for model interpretation. Twelve predictors were selected, including 7-AABs, gender, LYM, RDW, PAR, and fibrinogen (Fg). The random forest model demonstrated optimal performance. A simplified five-feature model (Fg, GBU4-5, SOX2, p53, MAGE A1) retained 95% of incremental discriminatory performance. SHAP identified Fg as the strongest contributor. Decision curve analysis confirmed clinical net benefit. A web-based calculator was developed to facilitate external validation. This proof-of-concept study presents an interpretable machine learning model integrating 7-AABs and routine laboratory parameters for pulmonary nodule risk stratification. The simplified model maintains robust performance while improving clinical practicality. Its current sensitivity (65.1%) precludes standalone screening use; rather, it serves as an auxiliary tool for identifying low-risk patients who may be candidates for conservative management. External and prospective validation are mandatory before clinical translation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-42111-z.