Abstract
This study aimed to describe and compare background factors and symptoms at diagnosis of patients with non-advanced or advanced stage lung cancer and patients without cancer, and to develop predictive models identifying key variables that contribute to the detection of early and late-stage lung cancer. Univariate logistic regression and three machine learning algorithms were used. Compared to patients without cancer, six background factors and two symptoms differed in non-advanced lung cancer, while 11 background factors and 19 symptoms differed in advanced cases. The machine learning models showed moderate performance in classifying patients with lung cancer from those without cancer. Notably, top predictors extended beyond classic respiratory symptoms. Demographic and lifestyle factors, particularly age, smoking status, and living situation, remained essential alongside symptoms such as pain, appetite loss, weight reduction, and respiratory problems. These findings support integrating clinical, demographic, and patient-reported symptoms to improve lung cancer risk models and refine referral decisions in screening pathways. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-46710-8.