Abstract
Background: Accurately predicting protein druggability is crucial for successful drug development, as it significantly reduces the time and resources required to identify viable drug targets. However, existing methods often face trade-offs between accuracy, efficiency, and interpretability. This study aims to introduce a lightweight framework designed to address these challenges effectively. Methods: We present a lightweight framework that embeds proteins into four biologically informed, non-Euclidean metric spaces, derived from analyses of amino acid sequences, predicted secondary structures, and curated post-translational modification (PTM) annotations. These representations capture key features such as hydrophobicity profiles, PTM densities, spatial patterns, and secondary structure composition, providing interpretable proxies for structure-related determinants of druggability. This approach enhances our understanding of protein functionality while improving druggability predictability in a biologically relevant context. Results: Evaluated on an Aspirin-binding protein dataset using leave-one-out cross-validation (LOOCV), our distance-based ensemble achieves 92.25% accuracy (AUC = 0.9358) in the whole-protein setting. This performance significantly outperforms common sequence-only baselines in the literature while remaining computationally efficient. Conclusions: On a refined single-chain subset, our framework demonstrates performance comparable to established feature engineering pipelines, highlighting its potential effectiveness in practical applications. Together, these results strongly suggest that biologically grounded, non-Euclidean embeddings provide an effective and interpretable alternative to resource-intensive 3D pipelines for target assessment in drug discovery. This approach not only enhances our ability to assess protein druggability but also streamlines the overall process of target identification and validation.