Abstract
Patients diagnosed with breast cancer exhibit a diverse range of prognostic outcomes due to the varied nature of the disease across different patient groups. To address this complexity and enhance prognostic predictions based on gene expression data from breast cancer samples, this study has developed an integrated deep learning method that combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (BiLSTM) networks. This automated pipeline conducts a correlation analysis using Pearson correlation to derive a reliable 236-gene set, ensuring no data contamination from patient samples.Furthermore, patterns of gene-gene interactions based on correlations were examined to provide further evidence of the biological relevance of the gene set that was selected. The training and validation of the proposed model utilized data from The Cancer Genome Atlas-Breast Cancer (TCGA-BRCA) and was assessed using the METABRIC dataset to enhance generalization capabilities. Experimental results indicate that the Full Hybrid (CNN BiLSTM) model significantly outperforms other machine learning and deep learning approaches. Notably, while the BiLSTM-only model achieved an optimal Recall of 0.9319, the hybrid model demonstrated a substantially higher Recall of 0.9943, accompanied by an impressive ROC AUC of 0.9955 and an F1 score of 0.9962. Furthermore, the proposed framework has been statistically validated, achieving a minimal variance of 0.000083 even under conditions of up to 20% noise perturbation. Optimization of this framework was conducted using the Optuna Bayesian Optimization methodology on a dual NVIDIA Tesla T4 array configuration. Overall, this article presents a universal computational tool for precision medicine in breast cancer, designed to yield consistent results across diverse patient scenarios.