Abstract
Drug repositioning offers an efficient route to discover new therapeutic indications for existing drugs. However, current computational drug repositioning models often face challenges related to data scarcity, heterogeneity, and therefore limited generalizability. To address these limitations, this study introduces DMAPLM, a multimodal pretrained framework for predicting drug-disease associations for further drug repositioning screening. DMAPLM leverages a lightweight dual-encoder architecture, utilizing ChemBERTa-2 for molecular encoding of drug SMILES strings and BioBERT for semantic encoding of multi-field disease texts. The framework explicitly aligns drug and disease representations through contrastive learning and employs attention-weighted pooling to emphasize informative molecular substructures. A Random Forest classifier is finally used for association prediction based on the enhanced multimodal features. We compile a new and comprehensive benchmark dataset for performance evaluation. Extensive experiments demonstrate that DMAPLM significantly outperforms six state-of-the-art baseline models, achieving an AUROC of 0.8919 and AUPR of 0.9116 under five-fold cross-validation, representing an improvement of up to 9%. Furthermore, DMAPLM exhibits robust performance in challenging cold-start scenarios, highlighting its practical utility for identifying novel drug-disease relationships. Case studies along with molecular docking analysis confirm the interpretability and biological meaningfulness of our predictions. Our study provides a powerful and interpretable approach for computational drug repositioning.