Abstract
Promoter proximal pausing by RNA polymerase II is critical for regulating gene expression in multicellular eukaryotes. How nucleic acid sequence and protein factors contribute to pausing remains incompletely understood. We developed Gene-specific Analysis of Transcriptional Output (GATO)-seq, which for the first time enables massively parallel, temporally resolved, reconstituted transcription in an assay that uses direct RNA sequencing to map 3'ends of nascent transcripts from a library of human genes. GATO-seq identified a "super pause" sequence that potently induces RNA polymerase II pausing and is not relieved by rescue factor Transcription Factor (TF) IIS. Cryogenic-electron microscopy (cryo-EM) structures of RNA polymerase II on the super pause sequence reveal a previously unobserved, reversible single-nucleotide backtracked state ("sidetracked"), stabilized by a threonine-lined pocket that limits further backtracking. We introduce a powerful in vitro technique that can be employed to study transcription regulation and through its use show that nucleic acid sequence encodes pausing propensity and traps sequence specific offline states, linking sequence to pausing control.