Abstract
Deriving the sequence of transitions between cell types, or differentiation events, that occur during organismal development is one of the fundamental challenges in developmental biology. Single-cell and spatial sequencing of samples from different developmental timepoints provide data to investigate differentiation but inferring a sequence of differentiation events requires: (1) finding trajectories, or ancestor:descendant relationships, between cells from consecutive timepoints; (2) coarse-graining these trajectories into a differentiation map, or collection of transitions between cell types, rather than individual cells. We introduce Hidden-Markov Optimal Transport (HM-OT), an algorithm that simultaneously groups cells into cell types and learns transitions between these cell types from developmental transcriptomics time series. HM-OT uses low-rank optimal transport to simultaneously align samples in a time series and learn a sequence of clusterings and a differentiation map with minimal total transport cost. We assume that the law governing cell-type trajectories is characterized by the joint law on consecutive time points, tantamount to a Markov assumption on these latent trajectories. HM-OT can learn these clusterings in a fully unsupervised manner or can generate the least-cost cell type differentiation map consistent with a given set of cell type labels. We validate the unsupervised clusters and cell type differentiation map output by HM-OT on a Stereo-seq dataset of zebrafish development, and we demonstrate the scalability of HM-OT to a massive Stereo-seq dataset of mouse embryonic development.