Abstract
MOTIVATION: Current spatial proteomics data analysis workflows are limited in efficiency and scalability when applied to gigapixel sized datasets. Moreover, they often lack extensive quality control tools and exhibit limited interoperability with existing spatial omics analysis ecosystems. RESULTS: We introduce Harpy, a new Python workflow capable of accelerated processing of large spatial proteomics datasets. We demonstrate the utility of Harpy on four datasets and show that it can rapidly apply state-of-the-art segmentation and feature extraction via parallel processing. Each analysis step is accompanied by appropriate quality control steps. Scalable clustering of cells and pixels allows identification of cell types, processed up to 27 times faster than previously reported. Processing and visualization can be performed locally or on high-performance computing servers. Additionally, Harpy integrates well with existing spatial single-cell analysis tools in the Python and R software ecosystem. AVAILABILITY AND IMPLEMENTATION: Harpy is available on GitHub at https://github.com/saeyslab/harpy and archived on Zenodo at https://doi.org/10.5281/zenodo.15546703.