Abstract
Understanding how transcription factors (TFs) recognize DNA motifs is central to deciphering gene regulation. However, integrating multi-omics data, particularly DNA methylation, which can variably influence TF binding, remains a significant challenge. To address this, we developed BayesPI-Feature Learning Yard (BayesPI-FLY), a Bayesian neural network for de novo motif discovery that integrates DNA sequence information with DNA methylation status data. Building upon the classical biophysical model of TF-DNA interactions, BayesPI-FLY employs a two-layer inference architecture to jointly estimate model parameters and hyperparameters within a Bayesian framework. The core algorithms are implemented in C and parallelized through Python, ensuring computational efficiency. BayesPI-FLY quantitatively characterizes methylation effects at both single-nucleotide and motif levels, and generates position weight matrices and sequence logos to facilitate motif interpretation. Validation using synthetic and high-throughput sequencing datasets, including whole-genome bisulfite sequencing data, demonstrates that the framework can recapitulate known methylation-associated TF-binding patterns and infer strand-specific associations within the modeling framework. Collectively, BayesPI-FLY offers a versatile and extensible computational platform for characterizing methylation-related TF-DNA binding patterns across complex epigenetic contexts.