Abstract
OBJECTIVE: Large language models (LLMs) are currently abundant and diverse, yet clinicians lack clarity on top performers, with uncertainty about general LLMs' expertise in musculoskeletal rehabilitation. This study aims to investigate the potential and correctness of LLMs in clinical application, and to evaluate whether LLMs could assist primary rehabilitation therapists to prepare for rehabilitation examination. METHOD: 8 primary doctors and therapists tested 10 LLMs in the first test, 5 senior doctors and therapists assessed answers in the second test, and 5 primary therapists acted as examinees in the third test. We assessed the quality of case analysis based on six different dimensions, including Case Understanding, Clinical Reasoning, Primary Diagnosis, Differential Diagnosis, Treatment Plan Accuracy and Safety, and Guidelines & Consensus. RESULTS: In the first test, only ERNIE Bot X1 Turbo and Doubao 1.5 pro had accuracy rates of over 90%, and Chinese LLMs had significantly fewer incorrect questions than English LLMs (9.6% vs. 14.8%, P < 0.001). In the second test, Doubao 1.5 pro achieved relatively high scores in both cases, and LLMs gained high scores in "Case understanding", "Clinical Reasoning" and "Diagnosis". In the third test, primary therapists achieving a mean accuracy rate of 76.9%, and Doubao 1.5 pro improved its accuracy rates to 85.8%. CONCLUSIONS: Doubao 1.5 pro possessed competent ability and application prospects, and was assessed as the best LLM for answering musculoskeletal rehabilitation questions. We also demonstrated that the response quality of local-language LLMs was significantly better than that of English LLMs in answering localized language questions.