Abstract
Accurate identification of germline variants from whole-exome sequencing (WES) data is foundational to population genetics, disease association studies, and clinical genomics. However, variant calling across cohorts poses challenges in scalability, consistency, and reproducibility. We present GermVarX, a fully automated, modular workflow for joint germline variant discovery and exploration in WES cohort studies. A key feature of GermVarX is its implementation of joint variant calling, enabling simultaneous genotyping of multiple samples to produce a single, high-confidence multi-sample VCF, optimized for downstream analyses. Developed with Nextflow DSL2, GermVarX ensures reproducibility, portability, and efficient parallelization across diverse computing environments, including workstations, HPC clusters, and cloud platforms. The workflow integrates two state-of-the-art variant callers-GATK HaplotypeCaller and DeepVariant-with joint genotyping performed via GATK or GLnexus. To increase reliability, GermVarX supports consensus generation between callers, coupled with sample- and cohort-level quality control, functional annotation using the Variant Effect Predictor (VEP), and unified reporting through MultiQC. In addition, it provides PLINK-compatible outputs, facilitating seamless integration with statistical and association analyses. GermVarX delivers a scalable, reproducible, and comprehensive solution for germline variant analysis in large WES studies, supporting consistent and interpretable results for both research and clinical genomics. The source code and usage instructions are available at https://github.com/thaontp711/GermVarX.