Abstract
OBJECTIVE: To compare the performance of ChatGPT-5.0, DeepSeek-R1, and Gemini-2.5 Pro in real-world outpatient prescription counseling and evaluate their applicability across clinical contexts. METHODS: Fifty authentic prescriptions from four departments were submitted to the three models using standardized Chinese prompts. Responses were independently rated by three associate chief pharmacists across five dimensions-accuracy, relevance, clarity, practicality, and completeness-on a 5-point Likert scale. Rank-based non-parametric tests were applied for overall and subgroup analyses. RESULTS: Significant inter-model differences were observed in most dimensions (P < 0.05). DeepSeek excelled in clarity and practicality, ChatGPT achieved the highest accuracy and completeness, while Gemini consistently scored lower. Department-specific analyses revealed distinct contextual advantages. All models exhibited high response stability. CONCLUSIONS: LLMs demonstrate promising yet heterogeneous performance in outpatient medication counseling. DeepSeek and ChatGPT showed superior overall quality, supporting their potential as assistive "AI pharmacists" under professional supervision. However, several limitations should be acknowledged, including a modest sample size, reliance on expert evaluation rather than patient feedback, and context-specific findings that may limit generalizability.