Abstract
Lung adenocarcinoma (LUAD) is characterized by substantial genetic heterogeneity, making it challenging to identify reliable biomarkers for diagnosis and treatment. Tumor mutational burden (TMB) is widely recognized as a predictive biomarker due to its association with immune response and treatment efficacy. In this study, we take a different approach by treating TMB as a response variable to uncover its genetic drivers using multiomics data. We conducted a thorough evaluation of recent feature selection methods through extensive simulations and identified three top-performing approaches: projection correlation screening (PC-Screen), distance correlation sure independence screening (DC-SIS), and Wasserstein distance-based screening (WD-Screen). Unlike traditional approaches that rely on simple statistical tests or dataset splitting for validation, we adopt a method-based validation strategy, selecting top-ranked features from each method and identifying consistently selected genes across all three. Using The Cancer Genome Atlas (TCGA) dataset, we integrated copy number alteration (CNA), mRNA expression, and DNA methylation data as predictors and applied our selected methods. In the two-platform analysis (mRNA + CNA), we identified 13 key genes, including both previously reported LUAD-associated genes (CCNG1, CKAP2L, HSD17B4, SHROOM1, TIGD6, and TMEM173) and novel candidates (DTWD2, FLJ33630, NME5, NUDT12, PCBD2, REEP5, and SLC22A5). Expanding to a three-platform analysis (mRNA + CNA + methylation) further refined our findings, with PCBD2 and TMEM173 emerging as the robust candidates. These results highlight the complexity of multiomics integration and the need for advanced feature selection techniques to uncover biologically meaningful patterns. Our multiomics strategy and robust selection approach provide insights into the genetic determinants of TMB, offering potential biomarkers for targeted LUAD therapies and demonstrating the power of Wasserstein distance-based feature selection in complex genomic analysis.