Abstract
Leishmania spp. regulate gene expression largely post-transcriptionally, yet untranslated regions (UTRs) remain poorly delineated. We generated high-quality genome and transcriptome datasets for Leishmania donovani strain 1S2D (Ld1S) by combining PacBio HiFi de novo assembly with Oxford Nanopore direct RNA sequencing of promastigotes and axenic amastigotes. The genome assembly consists of 65 scaffolds totaling ~33.3 Mb. Structural comparisons to LdBPK282A1 revealed numerous rearrangements, including some reshuffling genes among polycistronic transcription units and validated by polycistronic reads from RNA sequencing. Promastigote and amastigote RNA sequencing produced 469,010 and 46,729 monocistronic reads containing a spliced-leader and a polyA tail sequences, defining 8,479 transcripts and supporting 7,415 of the 7,969 annotated protein coding genes, as well as 604 putative long non-coding RNAs. We annotated UTRs for 4,921 genes and observed that putative RNA G-quadruplexes were markedly enriched in UTRs. We also noted that 31.9% and 11.5% were expressed into multiple isoforms in promastigotes and amastigotes, respectively. Collectively, these data provide a comprehensive annotation of L. donovani genes and their UTRs and reveal widespread and stage-specific UTR length polymorphisms, and, overall, points to an important role of 3' UTR in post-transcriptional regulation in L. donovani.