Abstract
OBJECTIVE: This study aimed to design and develop a virtual patient program using generative Artificial Intelligence (AI) technology, providing medical students opportunities to practice history-taking with a chatbot. We evaluated the feasibility of this approach by analyzing the quality of responses generated by the chatbot. RESULTS: Five expert reviewers participated in a pilot test, interacting with the chatbot to take the history of a patient presenting with a urinary problem using the Korean AI platform Naver HyperCLOVA X(®). They evaluated the AI responses using a five-item questionnaire rated on a five-point Likert scale. The chatbot generated 96 pairs of questions and answers, totaling 1,325 words in 177 sentences. Discourse analysis of the scripts revealed that 2.6% (34) of the words generated by the chatbot were deemed implausible and were categorized into inarticulate answers, hallucinations, and missing important information. Participants rated the AI answers as relevant (M = 4.50 ± 0.32), valid (M = 4.20 ± 0.40), accurate (M = 4.10 ± 0.20), and succinct (M = 3.80 ± 0.51), but were neutral about their fluency (M = 3.20 ± 0.60). Using generative AI for history-taking of virtual patients is feasible, but improvements are needed for more articulate and natural responses.