Abstract
The rapidly expanding body of biomedical literature encompasses a wealth of information concerning the pharmacological effects, mechanisms of action, adverse reactions, and repurposing potential of small-molecule therapeutics. Nevertheless, the systematic extraction and integration of this knowledge continue to pose substantial challenges. In this study, we propose an integrated text-mining framework for the automated extraction and structured representation of information on the biological activities of low-molecular-weight compounds, exemplified by angiotensin-converting enzyme (ACE) inhibitors as a representative pharmacological class. A corpus comprising over 20,000 PubMed titles and abstracts reporting in vitro, in vivo, and clinical investigations of ACE inhibitors was assembled. Chemical compounds, proteins/genes, and diseases were recognized using a previously developed named entity recognition model based on conditional random fields. Entity-level associations were extracted at the sentence level through a rule-based approach employing manually curated pattern phrases, followed by normalization via automated queries to PubChem, UniProt, and the Human Disease Ontology. The proposed methodology facilitated the extraction of approximately 22,000 unique and normalized associations encompassing drug-target, drug-disease, and drug-drug relationships. In addition to confirming well-established therapeutic effects and clinically recognized drug combinations, the analysis identified underexplored pharmacological activities of ACE inhibitors, including antineoplastic, antifibrotic, and neuropsychiatric properties, along with mechanistic associations involving matrix metalloproteinases and neurotrophic signaling pathways. Collectively, these findings underscore the potential of automated literature mining to advance systematic knowledge integration and data-driven hypothesis generation in the contexts of drug repurposing and safety evaluation.