Abstract
Peptides and proteins produced by programmed ribosomal frameshifting (PRF) are well-known in viruses. In non-viral systems, only a few examples of such chimeric sequences have been documented until recently. Three studies in eukaryotes showed that chimeric peptides are numerous and diverse. In ciliates, such peptides are associated with stop codons. In humans, their discovery was possible due to focusing on sequences with naturally repeated codons. This way, many candidate sequences with mass spectrometry (MS) proteomics-based support for translation have been identified. In a plant study, our group discovered MS-validated chimeric peptides using a unique modeling algorithm, MosaicProt, which is described and made available here. Our pipeline enables the identification of chimeric peptides in any organism for which transcript sequences and MS proteomic data are available. By design, our approach does not require prior knowledge about sequence similarity to already characterized PRF sites and can detect forward and backward frameshifts by 1 and 2 nucleotides. Thus, our pipeline opens a path for uncovering previously unknown PRF events across various transcript types, potentially broadening our understanding of proteome diversity. The pipeline was designed primarily for studies on mosaic translation, hence the name MosaicProt. However, it is applicable for research on PRF in many different contexts.