Abstract
BACKGROUND: Artificial intelligence (AI) holds promise for enhancing medical education, particularly in complex fields like cardiology. We assessed the ability of a large language model (LLM) to generate and evaluate educational material comparable to that created by human experts. METHODS: We trained an AI model on cardiology-specific content using 80 lectures from the St. Michael's Hospital Virtual Echo Rounds. The AI generated 10 multiple-choice questions (MCQs), and experienced cardiologists crafted an additional 10 MCQs. Eleven postgraduate year 4-6 cardiology trainees answered all 20 questions and attempted to identify the source (AI or human) of each question. The AI also answered the same set of questions. We analyzed performance using the Wilcoxon signed-rank test and recognition ability. RESULTS: Trainees scored similarly on AI-generated and human-generated questions (median 8/10 vs 8/10; P > 0.05). Their ability to identify the source of questions did not exceed chance levels (median correct identifications: 10/20; P > 0.05). The AI achieved 95% accuracy on AI-generated questions and 100% on human-generated questions. CONCLUSIONS: The AI-generated educational content was of comparable quality to that produced by human experts, and trainees could not reliably distinguish between the 2 sources. Our findings suggest that AI could significantly augment cardiology education by providing high-quality, scalable learning resources.