Abstract
Single-cell RNA sequencing now profiles whole transcriptomes for hundreds of thousands of cells, yet existing trajectory-inference tools rarely pinpoint where and when fate decisions are made. We present single-cell reinforcement learning (scRL), an actor-critic framework that recasts differentiation as a sequential decision process on an interpretable latent manifold derived with Latent Dirichlet Allocation. The critic learns state-value functions that quantify fate intensity for each cell, while the actor traces optimal developmental routes across the manifold. Benchmarks on hematopoiesis, mouse endocrinogenesis, acute myeloid leukemia, and gene-knockout and irradiation datasets show that scRL surpasses fifteen state-of-the-art methods in five independent evaluation dimensions, recovering early decision states that precede overt lineage commitment and revealing regulators such as Dapp1. Beyond fate decisions, the same framework produces competitive measures of lineage-contribution intensity without requiring ground-truth probabilities, providing a unified and extensible approach for decoding developmental logic from single-cell data.