Abstract
MOTIVATION: Pinpointing the subcellular location of proteins is essential for studying protein function and related diseases. Advances in spatial proteomics have shown that automatic recognition of protein subcellular localization from images could highly facilitate protein translocation analysis and biomarker discovery, but existing machine-learning works have been mostly limited to processing 2D images. By contrast, 3D images have higher spatial resolution and allow researchers to observe cellular structures in their natural context, but currently, there are only a few studies of 3D image processing for protein distribution analysis due to the lack of data and complexity of modeling. RESULTS: We developed a knowledge-enhanced protein subcellular localization model, KE3DLoc, which could recognize distribution patterns in 3D fluorescence microscope images using deep learning methods. The model designs an image feature extraction module that incorporates information from 3D and 2D projected cells and implements asymmetric loss and confidence weights to address data imbalance and weak cell annotation issues. Besides, considering that the biological knowledge in the Gene Ontology (GO) database can provide valuable support for protein location understanding, the KE3DLoc model incorporates a novel knowledge enhancement module that optimizes the protein representation by related knowledge graphs derived from the GO. Since the image module and the knowledge module calculate features from different levels, KE3DLoc designs protein ID aggregation to enhance the consistency of protein features across different cells. Experimental results on three public datasets have demonstrated that the KE3DLoc significantly outperforms existing methods and provides valuable insights for spatial proteomics research. AVAILABILITY AND IMPLEMENTATION: All datasets and codes used in this study are available at GitHub: https://github.com/PRBioimages/KE3DLoc.