Abstract
Multibatch isobaric labeling experiments are frequently applied for clinical and pharmaceutical studies of large sample cohorts. To tackle the critical issue of missing values in such studies, we introduce the ProSIMSIt pipeline. It combines the advantages of tandem mass spectrum clustering via SIMSI-Transfer and data-driven rescoring via Prosit and Oktoberfest. We demonstrate that these two tools are complementary and mutually beneficial. On large-scale cancer cohort data, ProSIMSIt increased the number of peptide spectrum matches (PSMs) by 40% on both global and phosphoproteome data sets. Furthermore, on data from proteome-wide drug-response profiling of post-translational modifications (decryptM), our pipeline substantially increased drug-PTM relations and revealed previously unseen downstream effects of drug target inhibition. ProSIMSIt is available as an open-source Python package with a simple command line interface that allows easy application to MaxQuant result files.