Abstract
BACKGROUND: Drug-induced liver injury (DILI) is a significant clinical problem. Current detection methods are often delayed. Real-time analysis of electronic medical records (EMRs) using a large language model (LLM) could enable earlier surveillance. OBJECTIVE: To evaluate the technical feasibility of an LLM-powered system for real-time DILI identification assessment by extracting medication information from unstructured clinical notes. METHODS: We developed a system using a large language model (LLM) to extract medication lists from clinical text. Prompts were iteratively refined for optimal performance. We integrated DILI risk data from DILIrank and LiverTox, utilizing LLM and algorithmic matching to link extracted medications to database entries. We utilized the RxNORM database and manual mistyped medication, as well as the NHANES database for a structured medication list, to verify accurate results. RESULTS: Using 30 entries each from NHANES, RxNORM, and real-world cases, the LLM-based medication extraction achieved a precision of 0.96, recall of 0.97, and an F1-score of 0.97%. For NHANES data, no errors were found. Applying to real-world cases and mistyped dataset, the LLM-based extraction fared acceptably, with F1-scores of 0.94 and 0.97, respectively. The majority of error are due to trade name and combined medication names. CONCLUSION: This study demonstrates the potential of LLMs for accurate medication extraction from clinical notes, a crucial step towards real-time DILI risk assessment. However, the system requires further development and clinical validation before implementation. Future work will focus on matching methods, clinical validation, EMR integration, and development of an agentic AI to triage future DILI risk.