Abstract
Pro-inflammatory peptides (PIPs) play a pivotal role in the initiation, progression, and sustenance of inflammation. A more in-depth analysis of PIPs requires precise identification, for which computational methodologies have proven to be remarkably cost-effective and accurate. In this study, we introduce Cat-PIPpred, a sophisticated predictor for PIPs that combines CatBoost with cross-modal feature integration. Through a comprehensive evaluation involving cross-validation and independent testing, an optimized model is developed by employing various feature extraction techniques, refinement protocols, and classifier architectures. The integration of ESM-2 structural embeddings with Dipeptide Deviation from Expected Mean (DDE) evolutionary features allows for an extensive representation of sequences. Feature refinement effectively decreases memory consumption while enhancing operational efficiency. The final Cat-PIPpred surpasses existing predictors targeting PIPs, as well as general peptide classifiers. These findings affirm the efficacy of integrating multiple feature sets with advanced ensemble learning algorithms. The proposed framework not only ensures reliable PIP predictions but also offers valuable insights into the functional predictions of specialized peptides.