Abstract
Proteomics sequence search engines used for the routine analysis of proteomics MS/MS datasets can only identify peptides whose sequence is contained in the input sequence database. While most detectable sequences are in the public search databases for well characterized species, there are inevitably some sample-specific sequence variants that are not present in these databases. In order to achieve a more in-depth analysis, there is an emerging workflow that pairs an RNA-seq analysis of a sample with a corresponding proteomic analysis. RNA-seq assisted proteomics analysis holds four potential benefits over conventional analysis of proteomics data on its own: sample-specific single amino acid variants, additional transcript-level sequences, transcript abundance information for use in limiting the proteomic search space, and transcript presence to aid in protein ambiguity resolution. A protocol for processing RNA-seq data into a proteomics-ready sequence database along with various custom and standard tools will be presented. The use of these tools in the context of the production of the sequence databases for the ABRF iPRG 2013 study will be described.