Abstract
PURPOSE: To assess the ability of ChatGPT 3.5 to aid in the treatment planning process of first-time anteroinferior shoulder dislocation. METHODS: Forty fictional patient cases were created varying in 15 different characteristics, whose distribution was randomized. Six orthopaedic surgeons (3 residents and 3 specialists in shoulder surgery) were then asked to determine the best treatment option for these patient cases. Their answers were compared with the treatment recommendations proposed by ChatGPT in 2 different sessions on the basis of preselected literature. To counteract the wide dispersion of responses, tendencies towards nonoperative, open surgical, or arthroscopic treatment were subsequently defined. The results were then analyzed descriptively. RESULTS: The mean age of the fictional patients was 44 years (13-80 years), with 57.5% of the patients female. The agreement between the ChatGPT responses in the 2 sessions was 70.0%. In contrast, the 3 assistant physicians agreed with each other in 35% of all cases and the 3 specialists agreed in 32.5% of all cases. There was an exact match of 12.5% between the ChatGPT responses and all human assessments. In 65.0% of all cases, the physicians showed similar tendencies in their choice of therapy resulting in a 55.0% match between ChatGPT and the surgeons. CONCLUSIONS: There was no clear consensus regarding the treatment for first-time anteroinferior dislocations of the shoulder, neither among physicians nor with ChatGPT 3.5. However, ChatGPT 3.5 and physicians showed similar tendencies regarding the treatment in over half of the cases. Because of the inconsistent responses of ChatGPT 3.5, it cannot yet be considered as reliable tool for therapy planning. CLINICAL RELEVANCE: ChatGPT 3.5, widely available and free of charge, is increasingly used in clinical settings. However, it's crucial to highlight its limitations in treatment planning for pathologies, especially when there's no clear consensus even among experienced surgeons.