Abstract
BACKGROUND: In the current clinical workflow of radiation oncology, therapists manually summarize the physician-issued Computed Tomography (CT) simulation orders to prepare patients' CT simulations. This process increases the workload, introduces variability in documentation quality, and is prone to human errors. PURPOSE: This study aims to address these challenges using a large language model (LLM) to automate the generation of summaries from the CT simulation orders and evaluate its performance. METHODS: A total of 607 CT simulation orders were collected from the Aria database at our institution. A locally hosted LLaMa 3.1 405B model, accessed via the Application Programming Interface (API) service, was used to extract keywords from the CT simulation orders and generate summaries. The downloaded CT simulation orders were categorized into seven groups based on treatment modalities and disease sites. For each group, a customized instruction prompt was developed collaboratively with therapists to guide the LLaMa 3.1 405B model in generating summaries. The ground truth for the corresponding summaries was manually derived by carefully reviewing each CT simulation order and subsequently verified by therapists. The accuracy of the LLM-generated summaries was evaluated by therapists using the ground truth. RESULTS: Over 98% of the LLM-generated summaries aligned with the ground truth in terms of accuracy. Our evaluations showed an improved consistency in format and enhanced readability from the LLM-generated summaries compared to the corresponding therapist-generated summaries. This automated approach demonstrated consistent performance across all groups, regardless of treatment modality or disease site. CONCLUSIONS: This study demonstrated the high precision and consistency of the LLaMa 3.1 405B model in extracting keywords and summarizing CT simulation orders, suggesting that LLMs have great potential to assist in this task, reduce the workload of CT simulation therapists and improve radiation oncology workflow efficiency.