Abstract
INTRODUCTION: The chloroplast, a living relic of an ancient endosymbiotic interaction between a microalga and a microbe and the principal subcellular organelle responsible for biological CO(2) assimilation, is emerging as a key target for research to enhance photosynthetic efficiency beyond its current limitations. Given that accurate protein localization is a prerequisite for the in-depth scientific investigation and practical application of the membrane-compartmentalized photosynthetic organelle, numerous computational prediction tools have been proposed, yet their accuracy remains unsatisfactory. METHODS: To address the limitation, we herein present Chlamy_ChloroPred, a newly developed deep learning-based framework composed of multi-layered artificial neural networks, carefully designed to perform binary classification of chloroplast proteins in the model photosynthetic microorganism, Chlamydomonas reinhardtii. The model captures locality-aware features of determinant amino acid residues in the chloroplast transit peptide (cTP), generally located within the ~50-amino-acid N-terminal region of mature chloroplast proteins, through the integration of ProtBERT-BFD embeddings, stacked bidirectional long short-term memory (BiLSTM) networks, and an attentive pooling layer. RESULTS AND DISCUSSION: Our model achieved an accuracy of 0.8462 for the C. reinhardtii proteome, outperforming widely used localization predictors, including TargetP 1.1 (0.4970), TargetP 2.0 (0.7396), and PredAlgo (0.7738) under a binary classification scheme. Comparative analyses further demonstrated that Chlamy_ChloroPred exhibits competitive performance relative to the current state-of-the-art model, PB-Chlamy (0.8521), under identical evaluation conditions. Notably, despite being trained solely on the algal proteome, Chlamy_ChloroPred showed substantial cross-species versatility when applied to the proteome of the terrestrial plant, Arabidopsis thaliana, achieving an accuracy of 0.7316 - representing a 12.6% improvement over TargetP 2.0, a predictor with previously demonstrated cross-proteome versatility. This likely stems from the model's robust ability to capture conserved features of chloroplast proteins across proteomes from diverse photosynthetic lineages. CONCLUSION: We developed a deep learning-based framework, Chlamy_ChloroPred, that integrates carefully designed neural layers with low computational complexity, achieving high predictive accuracy and interpretability. We believe that Chlamy_ChloroPred represents a compelling alternative to existing predictors, especially when accurate inference of chloroplast proteins is required.