Abstract
BACKGROUND: Promoters, as essential cis-regulatory elements in prokaryotes, govern gene expression by mediating RNA polymerase binding through core motifs and long-range regulatory interactions, playing a pivotal role in cell metabolism and environmental adaptation. Hence, accurate identification of prokaryotic promoters is vital for understanding their biological functions. However, the existing tools for predicting prokaryotic promoters are mainly concentrated on individual model organisms, and their prediction accuracy needs to be further improved. To address these gaps, we develop iPro-MP, a transformer-based prokaryotic promoter prediction framework that we systematically evaluate across 23 phylogenetically diverse species, including both model and non-model organisms. RESULTS: iPro-MP utilizes a multi-head attention mechanism to capture textual information in DNA sequences and effectively learns the hidden patterns. Cross-species prediction demonstrates the necessity of constructing species-specific models. Through a series of experiments, iPro-MP shows outstanding performance, with the AUC exceeding 0.9 in 18 out of 23 species. CONCLUSIONS: Our novel approach to predicting prokaryotic promoters, iPro-MP, provides the superiority to other existing tools, especially in predicting non-model organisms. Finally, for the convenience of other researchers, the source code and datasets of iPro-MP are freely available at https://github.com/Jackie-Suv/iPro-MP .