Abstract
Precise control of protein adsorption on polymer surfaces is essential in materials science and biomaterial design, with applications in antifouling materials, biosensors, cell culture, and drug delivery systems. However, the complex interactions between polymers and proteins and the limited availability of high-quality interaction data remain major challenges in polymer informatics. Current approaches often lack the generalizability needed to model diverse polymer-protein systems within a single unified framework, and there is a paucity of comprehensive predictive models capable of handling diverse polymer-protein interactions. To address these challenges, we introduce BB-EIT (Biointerface BERT Encoder for Interaction Translation), a novel generalized model designed to accurately predict the amount of diverse protein adsorption on polymer brushes. BB-EIT leverages the pretrained ChemBERTa large language model (LLM) architecture using SMILES strings for robust chemical representation and convenient data augmentation through SMILES enumeration. By adapting the pretrained model with an extended layer integrating a comprehensive set of physicochemical and biochemical features, including polymer thickness, water contact angle, and surface charge as well as protein isoelectric point (pI) and size, the BB-EIT showed state-of-the-art performance and strong generalizability. The model accurately predicted the adsorption behavior in previously unseen polymer and protein systems. This work represents an important step toward the data-driven design of biomaterials with tailored protein adsorption properties.