Comparison of large language models in management advice for melanoma: Google's AI BARD, BingAI and ChatGPT

黑色素瘤管理建议中大型语言模型的比较：谷歌的 AI BARD、BingAI 和 ChatGPT

阅读：1

作者：Mu,Xin,Lim,Bryan,Seth,Ishith,Xie,Yi,Cevik,Jevan,Sofiadellis,Foti,Hunter-Smith,David J,Rozen,Warren M

期刊：	Skin Health and Disease	影响因子：	0.000
时间：	2024	起止号：	2024 Feb;4(1):e313
doi：	10.1002/ski2.313	疾病类型：	黑色素瘤

Abstract

Large language models (LLMs) are emerging artificial intelligence (AI) technology refining research and healthcare. Their use in medicine has seen numerous recent applications. One area where LLMs have shown particular promise is in the provision of medical information and guidance to practitioners. This study aims to assess three prominent LLMs-Google's AI BARD, BingAI and ChatGPT-4 in providing management advice for melanoma by comparing their responses to current clinical guidelines and existing literature. Five questions on melanoma pathology were prompted to three LLMs. A panel of three experienced Board-certified plastic surgeons evaluated the responses for reliability using reliability matrix (Flesch Reading Ease Score, the Flesch-Kincaid Grade Level and the Coleman-Liau Index), suitability (modified DISCERN score) and comparing them to existing guidelines. t-Test was performed to calculate differences in mean readability and reliability scores between LLMs and p value <0.05 was considered statistically significant. The mean readability scores across three LLMs were same. ChatGPT exhibited superiority with a Flesch Reading Ease Score of 35.42 (±21.02), Flesch-Kincaid Grade Level of 11.98 (±4.49) and Coleman-Liau Index of 12.00 (±5.10), however all of these were insignificant (p > 0.05). Suitability-wise using DISCERN score, ChatGPT 58 (±6.44) significantly (p = 0.04) outperformed BARD 36.2 (±34.06) and was insignificant to BingAI's 49.8 (±22.28). This study demonstrates that ChatGPT marginally outperforms BARD and BingAI in providing reliable, evidence-based clinical advice, but they still face limitations in depth and specificity. Future research should improve LLM performance by integrating specialized databases and expert knowledge to support patient-centred care.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。