Abstract
Gene regulatory networks (GRNs) explain how the genome controls cellular behaviour and tissue morphogenesis, serving to connect molecular mechanism to functional output. Single-cell technologies now provide descriptions of these networks with unprecedented detail, but this advance has also revealed gene regulatory systems that are too complex for our existing conceptual frameworks. GRNs, which should provide mechanistic explanations, are increasingly reduced to statistical correlations - 'hairballs' that fail to capture molecular causation. Here, we explore why this dilemma exists and propose a path forward. We argue that methods in 'representation learning' can be used to model GRNs, without needing to capture every molecular detail. For this framework, we advocate three linked principles: models must be inherently mechanistic, with structures grounded in cellular and evolutionary biology; molecular principles and constraints must be used to reduce the solution space for learning GRN models; and more sophisticated forms of experimental perturbation and synthetic biological engineering are needed to train models and test predictions. By reimagining GRNs through these principles, we can bridge the gap from data abundance to new conceptual understanding.