Abstract
MOTIVATION: Discovering genetic variations underpinning brain disorders is important to understand their pathogenesis. Indirect associations or spurious causal relationships pose a threat to the reliability of biomarker discovery for brain disorders, potentially misleading or incurring bias in subsequent decision-making. Unfortunately, the stringent selection of reliable biomarker candidates for brain disorders remains a predominantly unexplored challenge. RESULTS: In this article, to fill this gap, we propose a fresh and powerful scheme, referred to as the Causality-aware Genotype intermediate Phenotype Correlation Approach (Ca-GPCA). Specifically, we design a bidirectional association learning framework, integrated with a parallel causal variable decorrelation module and sparse variable regularizer module, to identify trustworthy causal biomarkers. A disease diagnosis module is further incorporated to ensure accurate diagnosis and identification of causal effects for pathogenesis. Additionally, considering the large computational burden incurred by high-dimensional genotype-phenotype covariances, we develop a fast and efficient strategy to reduce the runtime and prompt practical availability and applicability. Extensive experimental results on four simulation data and real neuroimaging genetic data clearly show that Ca-GPCA outperforms state-of-the-art methods with excellent built-in interpretability. This can provide novel and reliable insights into the underlying pathogenic mechanisms of brain disorders. AVAILABILITY AND IMPLEMENTATION: The software is publicly available at https://github.com/ZJ-Techie/Ca-GPCA.