Abstract
Variant calling in segmental duplications is challenging for short-read sequencing because of ambiguous read origins. We present SDrecall, a method for sensitive variant detection in these regions. Upon constructing a network of homologous sequences, SDrecall realigns reads to each segmental duplication from its homologous counterparts. Realignments are phased and assembled into haplotypes via graph-based algorithms, followed by integer linear programming to retain the two most plausible haplotypes. Tested against long-read benchmarks, SDrecall achieved 95% sensitivity, while maintaining manageable false positives for short variants. SDrecall thus offers significant value for molecular diagnosis in terms of causal mutation detection within homologous regions.