Abstract
AIMS: To assess the utility of the artificial intelligence (AI) chatbot ChatGPT (openly available version 3.5) in responding to real-world pharmacotherapeutic queries from healthcare professionals. METHODS: Three independent and blinded evaluators with different levels of medical expertise and professional experience (beginner, advanced, and expert) compared AI chatbot- and physician-generated responses to 70 real-world pharmacotherapeutic queries submitted to the clinical-pharmacological drug information centre of Hannover Medical School between June and October 2023 with regard to quality of information, answer preference, answer correctness and quality of language. Inter-rater reliability was assessed with Krippendorff's alpha. Two separate investigators not otherwise involved in the conduct or analysis of the study selected the top three clinically relevant errors in chatbot- and physician-generated responses. RESULTS: All three evaluators rated the quality of information of physician-generated responses higher than the quality of information of AI chatbot-generated responses and, accordingly, thought that the physician-generated responses were better than the chatbot-generated responses (answer preference). All evaluators detected factually wrong information more frequently in chatbot-generated responses than in physician-generated responses. Although the beginner and expert evaluators rated the quality of language of physician-generated responses higher than the quality of language of chatbot-generated responses, there was no significant difference according to the advanced evaluator. CONCLUSIONS: ChatGPT's responses to real-world pharmacotherapeutic queries were substantially inferior compared to conventional physician-generated responses with regard to quality of information and factual correctness. Our study suggests that to date it must be strongly cautioned against the use of ChatGPT in pharmacotherapy counselling.