Abstract
Haplotype phasing refers to determining the haplotype sequences inherited from each parent in a diploid organism. It is a critical process for various downstream analyses, and numerous haplotype phasing methods for genomic single nucleotide polymorphisms (SNPs) have been developed. Allele-specific (AS) expression and alternative splicing play key roles in diverse biological processes. AS studies usually focus more on exonic SNPs, and multiple phased SNPs need to be combined to obtain better inferences. In this paper, we introduce an alignment-free algorithm HPTAS for haplotype phasing in AS studies. Instead of using sequence alignment to count the number of reads covering SNPs, HPTAS constructs a mapping structure from transcriptome annotations and SNPs and employs a k-mer-based approach to derive phasing counts from RNA-seq data. Using both next-generation sequencing (NGS) and the third-generation sequencing (TGS) NA12878 RNA-seq data and comparing with the most advanced algorithm in the field, we have demonstrated that HPTAS achieves high phasing accuracy and performance and that transcriptome data indeed facilitates the phasing of exonic SNPs. With the continued advancement of sequencing technology and the improvement in transcriptome annotations, HPTAS may serve as a foundation for future haplotype phasing methods.