Abstract
IMPORTANCE: Optimizing order sets is vital to enhance clinical decision support and improve patient care. Manual review is resource intensive and cannot timely identify potential improvements in order sets. OBJECTIVE: To develop and evaluate the utility of a large language model (LLM)-powered multiagent system in optimizing order sets. DESIGN, SETTING, AND PARTICIPANTS: A multiagent system was developed and evaluated between January 1, 2024, and December 31, 2024, which comprised agents for content critique, dynamic search, knowledge retrieval, medication verification, and suggestion summarization. A filter was developed to align suggestion usefulness scores with expert preferences. Experiment 1 evaluated 735 generated suggestions from a multiagent system developed for optimizing order sets, which were assessed by 3 physicians for 9 order sets and by 1 physician for 62 order sets. Experiment 2 implemented an LLM-as-a-judge approach to align generated suggestions with expert ratings and developed a filter to further refine the system's performance. The study was performed at Vanderbilt University Medical Center. A total of 735 suggestions for 71 order sets at VUMC were evaluated by 3 physicians. MAIN OUTCOMES AND MEASURES: The ratings of accuracy, usefulness, feasibility, and impact; interrater agreement; and alignment against historical ordering data. RESULTS: In evaluation 1 of experiment 1, the median values for the number of suggestions scoring 4 or higher at the order set level were 5 (IQR, 5-6) for the metrics of accuracy, 2 (IQR, 1-4) for usefulness, 1 (IQR, 0-3) for feasibility, and 1 (IQR, 0-2) for impact. Of 96 suggestions, 44 (46%; 95% CI, 36%-56%) aligned with historical ordering patterns. In evaluation 2 of experiment 1, 639 suggestions were generated for 62 order sets; 52 order sets had at least 1 useful suggestion, with a median of 2 (IQR, 1-3) useful suggestions. Overall, 122 suggestions (19%; 95% CI, 16%-22%) were rated as useful. After expert alignment, Cohen κ improved from 0.06 to 0.41. Filtering using the aligned scores reduced total suggestions by 29% while retaining 92% of useful suggestions. CONCLUSIONS AND RELEVANCE: In this cohort study of an LLM-powered multiagent system for optimizing order sets, leveraging LLMs and multiagent systems provided a scalable approach. Alignment with a small set of expert ratings significantly enhanced the LLM evaluation. Future research could refine reasoning capabilities and integrate useful suggestions into electronic health records, while engaging end-users as artificial intelligence-supported reviewers.