Abstract
In synthetic metabolic pathways, the intracellular level of enzymes is a critical determinant of pathway efficiency and, thus, short-lived enzymes create bottlenecks and limit overall metabolic productivity due to their low abundance. However, since studies on protein half-life remain limited in bacteria, its accurate prediction is a significant challenge. To address this, we developed a machine learning model, ProHL, for the classification of short-lived and long-lived proteins. ProHL employs a multimodal strategy, integrating ProteinBERT encodings (at both residue and sequence levels) with physicochemical encodings of the protein sequences. This integration enables the effective capture of both local and global sequence features, thereby ensuring accurate half-life classification. When evaluated on an independent test dataset of E. coli proteins, ProHL achieved an accuracy of 0.818 and a Matthew's correlation coefficient of 0.624. To demonstrate its practical utility in metabolic engineering, we classified CrtE, CrtB, and CrtI enzymes involved in lycopene biosynthesis and identified that only CrtB as short-lived. Consistent with this prediction, when CrtB was additionally expressed in a lycopene-producing base strain, lycopene production in E. coli increased up to 25%. Our computational framework, ProHL, identifies short-lived, rate-limiting enzymes by employing in silico prediction of enzyme half-life. This approach provides a viable strategy for alleviating metabolic bottlenecks, ultimately enhancing metabolic productivity.