Abstract
OBJECTIVE: This feasibility study aimed to assess the potential of freely available large language models (LLMs) to support clinical decision-making in obstetrics. METHODS: Five fictional obstetric patient cases, encompassing a range of clinical presentations (preeclampsia, fetal growth restriction, preterm premature rupture of membranes, vaginal bleeding, and abdominal trauma), were presented to three LLMs: Chat-GPT (OpenAI), Gemini (Google), and DeepSeek. The LLMs were tasked with evaluating the patient information, suggesting potential diagnoses, and outlining appropriate management strategies. The responses were qualitatively assessed, and subsequently, four expert obstetricians evaluated the LLMs' recommendations using the Global Quality Score (GQS). RESULTS: The LLMs demonstrated an ability to process complex obstetric scenarios and generate diagnostic and management considerations that often aligned with established clinical principles. In cases like preeclampsia and preterm premature rupture of membranes, the LLMs accurately identified key issues and proposed relevant management steps. For fetal growth restriction, vaginal bleeding, and abdominal trauma, they outlined appropriate evaluation frameworks and differential diagnoses. The responses varied in their level of detail and directness. DeepSeek received the highest GQS for all five cases in total, whereas Google Gemini was outperformed by the two other LLMs in the cases of vaginal bleeding and abdominal trauma. CONCLUSION: This preliminary feasibility assessment suggests that freely available LLMs can generate plausible-sounding responses to obstetric vignettes. Further rigorous evaluation using quantitative methods, real-world data, and exploration of integration strategies is warranted to fully understand their role in enhancing clinical decision-making and improving patient care in obstetric practice.