Abstract
Prostate cancer is the second most common cancer in men across the United States, of which prostate adenocarcinoma (PRAD) is the most common subtype. Despite remarkable progress has been made in PRAD diagnosis and prognosis, Black American men have been found to have disproportionately high incidence and mortality rate in PRAD compared with non-Hispanic White men in the United States. While machine learning (ML) methods like transfer learning (TL) have shown promise in reducing racial disparities in PRAD, its effectiveness is often compromised by the requirement for large-scale training datasets, which are challenging to obtain in clinical settings. In addition, existing ML approaches only leverage single-omics data without integrating multi-omics information. To address these concerns, we propose a novel approach called MOTLPRAD that develops a Multi-Omics integration Transfer Learning framework to reduce health disparities in PRAD. Specifically, we first investigated two multi-modal ensemble methods, Pearson correlation coefficient (PCC) based patient-pairwise similarity and variational autoencoder (VAE), to integrate different types of omics data. Then, we adopted a transfer learning model based on domain adaptation to pre-train the model on the majority group (e.g., non-Hispanic White Americans) and fine-tune the model using the minority group (e.g., Black Americans). To mitigate data imbalance across different ethnic groups, we leveraged the Synthetic Minority Oversampling Technique (SMOTE) to augment the sample size of minority groups, which could further improve the performance of reducing health disparities in PRAD. Results based on a series of multi-omics data (mRNA, miRNA, and methylation) from The Cancer Genome Atlas (TCGA) database suggested that our proposed model significantly outperformed conventional transfer learning models and other computational models for reducing racial disparities when predicting progression-free interval (PFI) prognosis for PRAD patients. In addition, our model also demonstrated that integrating multi-omics data could remarkably boost the performance of mitigating health disparities compared to using single-omics data. We believe our proposed framework could provide a new way to systematically mitigate health disparities towards underrepresented groups for PRAD diagnosis, prognosis, and treatment.