Abstract
Background: The discovery of reliable biomarkers and therapeutic targets remains a critical challenge in thyroid cancer management. This study demonstrates the value of integrating traditional omics technologies with artificial intelligence approaches and single-cell validation to identify novel microRNA-based biomarkers and drug targets. We hypothesized that combining meta-analysis of bulk transcriptomics, machine learning-driven feature selection, and single-cell spatial mapping would enhance biomarker discovery and validation compared to using either approach independently. Methods: We employed a hybrid strategy integrating traditional transcriptomic analysis with AI-driven methods. Meta-analysis of three bulk RNA-seq datasets (GSE65144, GSE33630, GSE50901) was performed using effect size analysis, followed by machine learning-based forward feature selection to identify optimal biomarker combinations. Single-cell RNA-seq data (GSE184362, 196,145 cells from 23 thyroid cancer samples) provided cell-type-specific validation and immune microenvironment profiling. Comprehensive experimental validation was conducted using TPC-1 and BHT101 cell lines through miR-6756-5p overexpression and CRISPRi-mediated knockdown, including functional assays and xenograft experiments to establish therapeutic potential. Results: The AI-enhanced meta-analysis identified a four-gene diagnostic panel (BID, MIR6756, ITM2A, TGM2) achieving exceptional performance with AUC values of 1.0 and 0.99 in training sets and 0.74 in independent validation. Single-cell analysis of 50,000 cells revealed six major cell types with significant immune infiltration (61.9%), providing crucial cell-type specificity for the identified biomarkers. BID and ITM2A showed predominantly epithelial expression, while TGM2 was enriched in immune and stromal compartments, demonstrating multi-cellular biomarker patterns. Immune microenvironment analysis revealed distinct CD8+/CD4+ T cell ratios between metastatic and non-metastatic samples. hsa-miR-6756-5p, identified through this integrated approach, exhibited tumor-specific expression and demonstrated oncogenic properties by promoting proliferation, colony formation, migration, and invasion in vitro, while enhancing tumor growth in vivo, validating it as a novel therapeutic target. Discussion: Our study exemplifies the synergistic value of integrating traditional omics approaches with AI-driven analytics for biomarker and drug target discovery. The combination of machine learning-based feature selection from bulk transcriptomics with single-cell spatial validation addresses limitations of each approach used independently. This integrated framework successfully identified has-miR-6756-5p as both a diagnostic biomarker and therapeutic target, demonstrating how traditional experimental validation coupled with computational prediction enhances translational potential. The multi-scale approach spanning bulk transcriptomics, AI-driven biomarker selection, single-cell characterization, and functional validation represents an effective paradigm for developing clinically relevant cancer biomarkers and therapeutic targets.
