Abstract
Accurate and rapid identification of bacterial species is essential for public health, clinical diagnostics, and environmental monitoring. Although Raman spectroscopy offers a powerful, non-invasive alternative, reliance solely on spectral data often fails to distinguish species with highly similar signatures, particularly when the discriminating features are subtle. This difficulty is frequently compounded by a lack of integrated biological prior knowledge, which can hinder model performance. To address these challenges, we introduce BactoRamanBioNet, a novel multimodal neural network architecture. Our model employs a synergistic approach that utilizes a ResNet-Transformer architecture to capture complex spectral patterns and a CLIP text encoder to incorporate descriptive biological information, thereby enabling highly accurate multimodal classification of bacterial species. Empirical results demonstrate that BactoRamanBioNet achieves a classification accuracy of 98.2% and an F1-score of 98.0%. This performance surpasses the current state-of-the-art deep learning model, ResNet-1D, by 2.4% in accuracy and 2.0% in F1-score. Moreover, our model outperforms traditional classifiers, such as Support Vector Machine (SVM) and Random Forest (RF), by 9.8% and 7.9% in accuracy, respectively, while also exhibiting significant improvements in precision and recall. By establishing a new benchmark in performance and robustness, BactoRamanBioNet offers a powerful and reliable framework for automated microbiological analysis, paving the way for next-generation diagnostic systems.