Abstract
Phage therapy has received great attention as a promising antimicrobial treatment, and its core technique, namely predicting phage-bacterium interactions (PBIs), is crucial for understanding infection mechanisms and optimizing therapeutic strategies. However, existing computational methods mainly focus on the species or higher taxonomic levels, and usually neglect the potential of deep embedding representations, limiting their ability to capture complex biological patterns inherent in sequences. This hinders the discovery of rich sequence features, and restricts the clinical application of phage therapy. To address these limitations, we propose a novel deep learning framework (called PBIP) for strain-level PBI prediction. In PBIP, we first identify strain-level interactions through biological infection experiments and sequencing of Klebsiella pneumoniae isolated from the clinical environment of Xiangya Hospital. Then, we utilize a pretrained unified representation model to convert protein sequences of phages and bacteria into deep embeddings. Next, we apply the synthetic minority oversampling technique to generate positive interactions in the embedding space to address the data imbalance issue. Subsequently, we design a deep neural network that uses a convolutional neural network to extract local features, a bi-directional gated recurrent unit to capture global features, and an attention module to highlight significant features. Finally, a fully connected layer integrates this information for PBI prediction. Experimental results show the superiority of PBIP over the state-of-the-art methods in predicting PBIs. The code and datasets are available at https://github.com/a1678019300/PBIP.