Abstract
Hepatitis B virus (HBV) infection causes one million deaths annually and remains a major driver of hepatocellular carcinoma. Despite its compact 3.2 kb genome, HBV exhibits extensive alternative splicing. HBV splice variants contribute to immune evasion and reduce the likelihood of achieving a functional cure. Here, we show that HBV splicing efficiency - quantified from 279 RNA-sequencing libraries of HBV-associated liver biopsies and cultured cells - correlates more strongly with disease progression than the overall proportion of spliced HBV RNA, the latter of which has been proposed as an emerging biomarker. All HBV splice sites are embedded within protein-coding regions, forming a gene structure distinct from typical host splice sites. To decode the sequence determinants of HBV splicing, we apply SpliceBERT and OpenSpliceAI to 4,706 HBV genomes. These models reveal that HBV splice donor sites share features with host splice donor sites, whereas HBV splice acceptor sites are more cryptic. These patterns likely reflect constraints imposed by HBV's compact genome, which must accommodate overlapping protein-coding regions. Motif conservation and splicing propensity analyses across HBV genomes reveal context- and genotype-specific splicing patterns, indicating regulation by sequence context. HBV genotypes may have coevolved with their human hosts to exploit suboptimal but spliceable host-like motifs without disrupting their gene structure, supporting mechanisms of viral persistence and immune evasion. This study demonstrates the utility of artificial intelligence in decoding viral splicing patterns and provides a framework for investigating co-transcriptional processes in other clinically important viruses.