Abstract
Understanding intratumoral heterogeneity is essential for elucidating tumor biology. Compared to RNA expression, omics-level characterization of cell-type-specific protein expression remains a technical challenge. Bulk mass spectrometry (MS) provides abundant proteomics resources to infer cell-type specificity via data deconvolution; however, it is unclear which proteomic quantification formats are optimal, as they differ from the data types for which most deconvolution methods were designed. Here, leveraging recently generated large-cohort proteogenomics data, we systematically evaluated different MS proteomics quantification formats and preprocessing strategies to resolve cell-type-specific protein expression. Our results indicate that while label-free spectral counts can be used directly, TMT MS1 intensities and MS2 ratios are less suitable and require appropriate data transformation. We demonstrate that a 'min-score' transformation significantly improves MS1 intensity-based deconvolution, providing useful insights for subtyping pancreatic cancer. Moreover, we identified the coefficient of variation (CV) as a robust statistical indicator of deconvolution suitability. Finally, we developed "ProTransDeconv", an R package integrating data transformation, deconvolution, and quality checks for major MS proteomics data formats. This work provides practical guidance for deconvolving bulk proteomics to study cell-type-specific protein-level dysregulation.