Abstract
Sex chromosome complement is the largest karyotypic variation observed in humans. X and Y chromosomes were once a pair of homologous autosomes. Although chromosome X and Y differentiated from one another, they still share high sequence similarity in some regions, such as the pseudoautosomal regions (PARs) and X-transposed region (XTR). The sex chromosomes violate some assumptions of autosomes but are not always processed separately in genomics. We undertook a simulation study to assess the effects of standard autosomal versus sex chromosome complement (SCC)-informed alignment, variant calling, and filtering strategies on variants detected on the sex chromosomes. We find that aligning samples to a reference genome informed by the SCC of the sample increases the number of true positives called in the PARs, and, in XX samples only, also the XTR. In contrast, in XY samples, masking the XTR during alignment results in a 10-fold higher rate of false positives (FPs). We further find that haploid calling on the sex chromosomes in XY samples reduces the number of FPs compared to diploid calling but does not decrease the number of false negatives. Improving the accuracy of variant calling results in detection of variants that could be relevant to studies of health and disease, including variants we recovered in genes implicated in cardiomyopathy, immunodeficiency, and Alzheimer disease. We recommend future genomic analyses implement the following best practices for detecting variants: aligning samples to versions of the human reference genome informed by the SCC of the sample and using accurate ploidy when calling variants.