Abstract
Post-translational modifications (PTMs) are pivotal in cellular regulations, and their crosstalk is related to various diseases such as cancer. Given the prevalence of PTM crosstalk within close amino acid ranges, identifying peptides with multiple PTMs is essential. However, this task is an NP-hard combinatorial problem with exponential complexity, posing significant challenges for existing analysis methods. Here, we introduce PIPI-C (PTM-Invariant Peptide Identification with a Combinatorial model), a novel search engine that addresses this challenge through a mixed integer linear programming (MILP) model, thereby overcoming the limitations of existing approaches that struggle with high-order PTM combinations. Rigorous validation across diverse datasets confirms PIPI-C's superior performance in detecting PTM combinations. When applied to over 72 million mass spectra of three human cancers-lung squamous cell carcinoma (LSCC), colorectal adenocarcinoma (COAD), and glioblastoma (GBM)-PIPI-C reveals significantly upregulated PTM combinations. In LSCC, 50% of 860 upregulated unique PTM site patterns (UPSPs) (when comparing cancer vs. normal samples) carried at least two PTMs, including literature-supported crosstalks such as di-methylation with trifluoroleucine substitution and amidation with proline-to-valine substitution. Similar findings in COAD and GBM highlight PIPI-C's utility in uncovering cancer-relevant PTM combination landscapes. Overall, PIPI-C provides a robust mathematical framework for decoding complex PTM patterns, advancing our understanding of PTM-driven cellular processes in diseases.