Abstract
Integrating coding and regulatory variation into unified, interpretable representations remains a challenge in functional genomics. Current approaches either focus on common variants or analyze individual variants in isolation, missing the cumulative, cell-type-specific impact of both coding and noncoding variants on each gene. We present Volaria, a computational framework that integrates coding and regulatory genetic variation into unified, gene-centered representations for disease outcome prediction from whole-genome sequencing. Volaria leverages deep learning models to capture variant effects on cell-type-specific gene expression and integrates them with AI-predicted exonic variant pathogenicity to produce representations that capture the cumulative effect of genome-wide rare and common variation. Applied to whole genomes of individuals with rare glomerular diseases, Volaria predicts individual outcomes directly from germline sequence, demonstrating that structured, cell-type-aware representations capture predictive signals beyond population-based polygenic risk scores and unstructured representations. Importantly, the framework identifies context-specific biological mechanisms, providing interpretability that can be aligned with clinical measurements. By encoding genome-wide variation into compact and biologically grounded representations, Volaria provides a scalable foundation for genome interpretation and individualized outcome modeling from germline sequence, complementing phenotypic and clinical information in the future integrative frameworks.