Abstract
INTRODUCTION: With increasing accessibility to Artificial Intelligence (AI) chatbots, the precision and clarity of medical information provided require rigorous assessment. Urologic telesurgery represents a complex concept that patients will investigate using AI. We compared ChatGPT and Google Gemini in providing patient-facing information on urologic telesurgical procedures. METHODS: 19 questions related to urologic telesurgery were generated using general information from the American Urologic Association (AUA) and European Robotic Urology Section (ERUS). Questions were organized into 4 categories (Prospective, Technical, Recovery, Other) and directly typed into ChatGPT 4o and Google Gemini 2.5 (non-paid versions). For each question, a new chat was started to prevent any continuation of answers. Three reviewers independently reviewed the responses using two validated healthcare tools: DISCERN (quality) and Patient Education Material Assessment Tool (understandability and actionability). RESULTS: Mean DISCERN scores (out of 80) were higher for Gemini than ChatGPT in all domains except "Other". Prospective 49.2 versus 39.1; technical 52.3 versus 44.3; recovery 53.7 versus 45.4; other 54.3 versus 56.5; overall 52.4 versus 45.8 (Fig. 1). PEMAT-P understandability uniformly exceeded 70% for both platforms: prospective 80.0% versus 71.7%; technical 80.1% versus 79.8%; recovery 79.2% versus 80.1%; other 79.2% versus 81.3%; overall 79.7% versus 78.1% (Fig. 2). Actionability was uniformly low; only Gemini met the 70% threshold in the prospective domain (Fig. 3). Fig. 1 A Mean DISCERN scores (out of 5) with standard deviations for each response B Mean total DISCERN scores (out of 80) among each question category and overall. Numerical representation in the graph was rounded to the closest whole number for easier interpretation. Graphed value is still the true value. [56.5 represented numerically as 57, etc.…] Fig. 2 A Mean PEMAT-P Understandability scores with standard deviations for each response B Mean PEMAT-P Understandability scores among each question category and overall (70% minimum threshold for responses to be deemed "understandable"). Numerical representation in the graph was rounded to the closest whole number for easier interpretation. Graphed value is still the true value. [71.70% represented numerically as 72%, etc.…] Fig. 3 A Mean PEMAT-P Actionability scores with standard errors for each response B Mean PEMAT-P Actionability scores among each question category and overall (70% is the minimum threshold for responses to be deemed "actionable"). Numerical representation in the graph was rounded to the closest whole number for easier interpretation. The graphed value is still the true value. [65.40% represented numerically as 65%, etc.…] CONCLUSION: ChatGPT and Gemini deliver relevant and understandable information related to urologic telesurgery, with Gemini more consistently providing sources. However, neither chatbot reliably offers actionable responses, limiting their utility as a standalone gateway for patient decision-making.