Abstract
Early diagnosis of lung cancer is crucial for improving patient prognosis. In this study, we developed a diagnostic model for lung cancer based on serum proteomic data from the GSE168198 dataset using four machine learning algorithms (nnet, glmnet, svm, and XGBoost). The model's performance was validated on datasets that included normal controls, disease controls, and lung cancer data containing both. Furthermore, the model's diagnostic capability was further validated on an independent external dataset. Our analysis identified SLC16A4 as a key protein in the model, which was significantly downregulated in lung cancer serum samples compared to normal controls. The expression of SLC16A4 was closely associated with clinical pathological features such as gender, tumor stage, lymph node metastasis, and smoking history. Functional assays revealed that overexpression of SLC16A4 significantly inhibited lung cancer cell proliferation and induced cellular senescence, suggesting its potential role in lung cancer development. Additionally, correlation analyses showed that SLC16A4 expression was linked to immune cell infiltration and the expression of immune checkpoint genes, indicating its potential involvement in immune escape mechanisms. Based on multi-omics data from the TCGA database, we further discovered that the low expression of SLC16A4 in lung cancer may be regulated by DNA copy number variations and DNA methylation. In conclusion, this study not only established an efficient diagnostic model for lung cancer but also identified SLC16A4 as a promising biomarker with potential applications in early diagnosis and immunotherapy.