Abstract
Epstein-Barr virus (EBV) ubiquitously infects humans, establishing lifelong persistence in B cells. In vitro, EBV-infected B cells can establish a lymphoblastoid cell line (LCL). EBV's transcripts in LCLs (latency III) produce six nuclear proteins [EBV nuclear antigens (EBNAs)], two latency membrane proteins (LMPs) and various microRNAs and putative long non-coding RNAs [BamHI A rightward transcripts (BARTs)]. The BART and EBNA transcription units are characterized by extensive alternative splicing. We generated LCLs with B95-8 EBV-BACs, including one engineered with 'barcodes' in the first and last repeat of internal repeat 1 (IR1), and analysed their EBV transcriptomes using long-read nanopore direct RNA-seq. Our pipeline ensures appropriate mapping of the W promoter (Wp) 5' exon and corrects W1-W2 exon counts that misalign to IR1. This suggests that splicing across IR1 largely includes all W exons and that Wp-derived transcripts more frequently encode the EBNA-LP start codon than Cp transcripts. Analysis identified a short variant of exon W2 and a novel polyadenylation site before EBNA2, provided insights into BHRF1 miRNA processing and suggested co-ordination between polyadenylation and splice site usage, although improved read depth and integrity are required to confirm this. The BAC region disrupts the integrity of BART transcripts through premature polyadenylation and cryptic splice sites in the hygromycin expression cassette. Finally, a few transcripts extended across established gene boundaries, running from EBNA to BART to LMP2 gene regions, sometimes including novel exons between EBNA1 and the BART promoter. We have produced an EBV annotation based on these findings to help others better characterize EBV transcriptomes in the future.