Abstract
Single-cell RNA-sequencing captures a temporal slice, or a snapshot, of a cell differentiation process. A major bioinformatical challenge is the inference of differentiation trajectories from a single snapshot, and methods that account for outlier cells that are unrelated to the differentiation process have yet to be established. We present MultistageOT (https://github.com/dahlinlab/MultistageOT), a generalized optimal transport-based framework that models cell differentiation in a single snapshot as a series of intermediate cell transitions. MultistageOT employs multiple transport stages to establish temporal progression within the snapshot-overcoming limitations with the classic bimarginal formulation of optimal transport. Moreover, our multistage framework uses global information across all cells and differentiation stages to infer coherent trajectories from initial to terminal states. This allows MultistageOT to infer individual outlier cells that are unrelated to the analyzed differentiation process-an essential mechanism for preventing the inference of spurious or biologically implausible trajectories. We benchmark MultistageOT on snapshot data of cell differentiation, showing significantly improved fate prediction accuracy over state-of-the-art bimarginal optimal transport and demonstrating MultistageOT's unique ability to detect outlier cells.