Abstract
BACKGROUND: Chronic rhinosinusitis with nasal polyps (CRSwNP) is highly heterogeneous. Epithelial-mesenchymal transition (EMT) is implicated in mucosal remodeling and postoperative recurrence, yet robust EMT biomarkers consistently validated across cohorts and by histology are lacking. METHODS: RNA-seq data from a Chongqing (CQ) cohort were integrated with multiple GEO datasets. After batch-effect correction, differential expression analysis and weighted gene co-expression network analysis (WGCNA) were performed. EMT-related candidates were obtained by intersecting results with the MSigDB EMT gene set. Core genes were identified using multi-algorithm feature selection (LASSO, SVM-RFE, and random forest). A three-gene model was constructed and externally validated. Single-cell transcriptomic data were used to define cellular sources of core genes, and immune infiltration and pathway activity were assessed. Regulatory networks (TF/miRNA) and compound-disease associations were predicted. Finally, expression was validated in dual-center clinical cohorts (CQ and Liaoning [LN]) by qRT-PCR and immunohistochemistry/immunofluorescence, and associations with SNOT-22 and the eosinophilic endotype were evaluated. RESULTS: Twenty-five EMT-related candidate genes were identified. Multi-algorithm intersection highlighted SPP1, PTHLH, and IGFBP3 as EMT core genes, consistently upregulated in the training set, the external validation dataset, and dual-center specimens. The three-gene model achieved AUCs of 0.944-0.991 in the training set and 0.888-0.938 in the external validation dataset. Single-cell mapping indicated that SPP1 was primarily derived from myeloid cells, PTHLH from epithelial cells, and IGFBP3 enriched in fibroblasts. Higher core-gene expression was associated with increased immune infiltration and activation of TGF-β, hypoxia/glycolysis, and inflammation-related pathways. Histology supported EMT-associated phenotypic changes in CRSwNP, with stronger signals in the eosinophilic endotype. In CQ and LN cohorts, core-gene expression correlated with SNOT-22 (Spearman r = 0.402-0.569, P ≤ 0.021). CONCLUSIONS: SPP1, PTHLH, and IGFBP3 are robustly validated EMT core genes in CRSwNP across multiple cohorts and dual-center histology, closely linked to immune microenvironment alterations and mucosal remodeling. These genes represent robust EMT-associated candidate biomarkers for future stratification efforts and mechanistic investigations.