Abstract
BACKGROUND: Lung cancer remains the leading cause of cancer-related mortality. Single-cell RNA sequencing (scRNA-seq) enables high-resolution mapping of tumor microenvironment, but how malignant cell states connect to patient outcomes and therapeutic vulnerabilities is not fully understood. METHODS: We analyzed scRNA-seq profiles from 13 primary lung tumors and matched normal lung tissues (48,149 cells). Malignant versus non-malignant epithelial cells were defined by inferCNV, co-expression programs resolved with Hotspot, transcription factor regulons inferred by pySCENIC, and differentiation potential estimated by CytoTRACE and diffusion pseudotime. Key malignant programs were cross-checked in an independent LUAD scRNA-seq cohort (GSE131907). Program signatures were projected into TCGA-LUAD to build an eight-gene LASSO-Cox risk model, which was further validated in two external NSCLC/LUAD cohorts (GSE30219 and GSE31210). Immune contexture was inferred by CIBERSORT and TIDE, drug sensitivities predicted with pRRophetic, and the function of FAM189A2 tested by knockdown and overexpression in A549 and NCI-H23 cells. RESULTS: We delineated 12 major cell populations and six malignant epithelial subclusters, including an early GPRC5A(+) subpopulation with high developmental potential. Malignant programs corresponding to cell-cycle and inflammatory states (Modules 8, 14 and 16) were associated with worse overall survival, whereas Module 15, preferentially active in the GPRC5A(+) state, was linked to better outcomes. An eight-gene signature (LAMA5, ACTB, B4GALT1, KLF5, KRT18, FAM189A2, SLC34A2, S100A11) robustly stratified patients into high- and low-risk groups across TCGA-LUAD and two validation cohorts. High-risk tumors displayed an immune-enriched yet matrix-restrained microenvironment with higher CD8(+) T cells, whereas low-risk tumors showed greater predicted sensitivity to Aurora kinase, IGF-1R, mTOR and TGF-β inhibition. FAM189A2 was down-regulated in tumors, higher expression predicted better survival, and bidirectional perturbation in A549 and H23 cells demonstrated that loss of FAM189A2 enhanced, while overexpression attenuated, migration and invasion in vitro. CONCLUSIONS: Integrating single-cell and bulk transcriptomes links malignant epithelial state programs to prognosis, yields a practical eight-gene risk model validated in multiple LUAD cohorts, and nominates FAM189A2 as a putative tumor suppressor and potential biomarker in lung cancer. These findings suggest testable strategies for risk stratification and therapy selection that warrant prospective evaluation.