Abstract
In this study, we apply machine learning to model the spatiotemporal dynamics of gene expression during early Drosophila embryogenesis. By optimizing model architecture, feature selection, and spatial grid resolution, we developed a predictive pipeline capable of accurately classifying active nuclei and forecasting their future distribution in time. We evaluated the model on two reporter constructs for the short gastrulation (sog) gene, sogD and sogD_∆Su(H), allowing us to assess its performance across distinct genetic contexts. The model achieved high accuracy on the wild-type sogD dataset, particularly along the dorsal-ventral (DV) axis during nuclear cycle 14 (NC14), and accurately predicted expression in the central regions of both wild-type and Suppressor of Hairless (Su(H)) mutant enhancers, sogD_∆Su(H). Bootstrap analysis confirmed that the model performed better in the central region than at the edges, where prediction accuracy dropped. Our previous work showed that Su(H) can act both as a repressor at the borders and as a stabilizer of transcriptional bursts in the center of the sog expression domain. This dual function is not unique to Su(H); other broadly expressed transcription factors also exhibit context-dependent regulatory roles, functioning as activators in some regions and repressors in others. These results highlight the importance of spatial context in transcriptional regulation and demonstrate the ability of machine learning to capture such nuanced behavior. Looking ahead, incorporating mechanistic features such as transcriptional bursting parameters into predictive models could enable simulations that forecast not just where genes are expressed but also how their dynamics unfold over time. This form of in silico enhancer mutagenesis would make it possible to predict the effects of specific binding site changes on both spatial expression patterns and underlying transcriptional activity, offering a powerful framework for studying cis-regulatory logic and modeling early developmental processes across diverse genetic backgrounds.