Abstract
To efficiently and accurately understand, detect, and treat sepsis, we require a deep understanding of how its underlying biological systems work. Conventional methods struggle to provide a detailed view of both the functional and regulatory elements orchestrating a system-wide response to pathogenic infection. Our approach was to use a streamlined process: First, we used multi-omics integration to create a functional-omics signature at the single molecule level, employing sparse Partial Least Squares Discriminant Analysis (sPLSDA). Then, we turned to genome Natural Language Processing (genomeNLP) to identify correlated features and discover regulatory motifs in bacterial promoters. This novel method utilised specialised language models trained on DNA sequences. Our approach is distinct from traditional methods as we did not rely on predefined biomarkers and motifs or focus solely on short-range context. As a result, we (a) pinpointed multi-omics signatures with single molecule-level precision and (b) uncovered regulatory patterns for both novel and established regulatory motifs governing the omics signature. Ultimately, we presented a comprehensive systems-level view of five sepsis-causing Staphylococcus aureus bacterial strains, revealing hierarchical gene regulation in genome structure, ncRNA control, metabolism, and antibiotic resistance. We additionally note that our approach is agnostic to organism type and can be applied outside of a sepsis system.