Abstract
Biological function is mediated by the hierarchical organization of cell types and states within tissue ecosystems. Identifying interpretable composite marker sets that both define and distinguish hierarchical cell identities is essential for decoding biological complexity, yet remains a major challenge. Here, we present RECOMBINE, an algorithm that identifies recurrent composite marker sets to define hierarchical cell identities. Validation using both simulated and biological datasets demonstrates that RECOMBINE achieves higher accuracy in identifying discriminative markers compared to existing approaches, including differential gene expression analysis. When applied to single-cell data and validated with spatial transcriptomics data from the mouse visual cortex, RECOMBINE identified key cell type markers and generated a robust gene panel for targeted spatial profiling. It also uncovered markers of CD8+; T cell states, including GZMK+;HAVCR2-; effector memory cells associated with anti-PD-1 therapy response, and revealed a rare intestinal subpopulation with composite markers in mice. Finally, using data from the Tabula Sapiens project, RECOMBINE identified composite marker sets across a broad range of human tissues. Together, these results highlight RECOMBINE as a robust, data-driven framework for optimized marker selection, enabling the discovery and validation of hierarchical cell identities across diverse tissue contexts.