Abstract
Ks distribution, the distribution of the synonymous substitutions, has been widely used to estimate the species divergence using orthologous genes. However, conventional approaches often ignore the underlying bias that species divergence is delayed to average gene divergence by 2N (e) generations, where N (e) represents the ancestral effective population size, due to the lack of scalable methods for N (e) inference. Here, we demonstrate through simulations that K (s) distribution variance correlates with N (e), enabling direct estimation of ancestral population parameters from standard K (s) data. Leveraging this relationship, we present Tspecies, a framework that corrects divergence time estimates using only substitution rates and K (s) distributions, without requiring additional genomic data. Our practical application of Tspecies in Liriodendron has inferred a divergence time between North American and East Asian lineages (1.44 Ma) that align with early Pleistocene glaciation, and a large ancestral N (e) (∼5.29 × 10(4)) consistent with fossil evidence. Our finding reveals the correlation between the variance of K (s) distribution and N (e), and develops a computational framework to resolve the bias in K (s) based dating by incorporating a readily estimated N (e).