Abstract
MOTIVATION: The cellular composition of a solid tissue can be assessed either through the physical dissociation of the tissue followed by single-cell analysis techniques or by computational deconvolution of bulk gene expression profiles. However, both approaches are prone to significant biases. Tissue dissociation often results in disproportionate cell loss, while deconvolution is hindered by biological and technological inconsistencies between the datasets it relies on. RESULTS: Using calibration datasets that include both experimentally measured and deconvolution-based cell compositions, we present a new method, Harp, which reconciles these approaches to produce more reliable deconvolution results in applications where only gene expression data is available. Both on simulated and real data, harmonizing cell reference profiles proved advantageous over competing state-of-the-art deconvolution tools, overcoming technological and biological batch effects. AVAILABILITY AND IMPLEMENTATION: R package available at https://github.com/spang-lab/harp (archived as 10.5281/zenodo.16851930). Code and data for reproducing the results of this paper are available at https://github.com/spang-lab/harplication (archived as 10.5281/zenodo.16851705) and https://doi.org/10.5281/zenodo.15650057, respectively.