Abstract
MOTIVATION: Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset. RESULTS: We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at Figshare (doi: 10.6084/m9.figshare.28280414). XhetRel can be executed locally via Nextflow or accessed directly through the online Collab notebook at https://colab.research.google.com/drive/1ep69JvXLwK5ndHUQ8qIGTWvauzsTW9fi.