Clinics in Gastroenterology, vol 9 No. 1: Virus Hepatitis

《胃肠病学临床》第9卷第1期：病毒性肝炎

阅读：1

作者：Ekingen,Evren,Ucdal,Mete,Farthing,M,Thompson,R P H,Coleman,J C

期刊：	Journal of Clinical Medicine	影响因子：	2.900
时间：	1980	起止号：	1980 Sep;73(9):686
doi：	10.3390/jcm15051730	研究方向：	微生物学、毒理研究
疾病类型：	肝炎

Abstract

Background/Objectives: Large language models (LLMs) have shown promising results in medical decision support; however, their effectiveness in managing acute cholecystitis and other gallbladder diseases remains insufficiently examined. This study evaluated the performance of a neuro-symbolic LLM system that integrates multiple AI agents with neural-symbolic reasoning for acute cholecystitis management and compared its diagnostic accuracy with that of human expert physicians across three clinical specialties. Methods: This multi-center cross-sectional study included 30 case-based questions covering acute cholecystitis and gallbladder diseases, stratified across eight predefined disease categories: acute calculous cholecystitis (n = 6), acute acalculous cholecystitis (n = 2), complicated cholecystitis including gangrenous, emphysematous, and perforated variants (n = 5), chronic cholecystitis and biliary colic (n = 4), gallbladder polyps and adenomyomatosis (n = 3), Mirizzi syndrome (n = 2), gallbladder carcinoma (n = 4), and post-cholecystectomy complications (n = 4). Questions were categorized into diagnosis (n = 10), treatment (n = 10), and complications/prognosis (n = 10). Gold standard answers were established through consensus by an expert panel consisting of two senior general surgery expert clinicians and one senior emergency medicine expert clinician, each with more than 20 years of clinical experience, utilizing the Tokyo Guidelines 2018 (TG18) as the reference standard for diagnostic criteria, severity grading, and management recommendations. The expert panel achieved unanimous consensus on all 30 gold standard answers. All responses were cross-referenced against the primary TG18 publications to ensure guideline-based rather than solely opinion-based reference standards. This consensus-based, guideline-anchored approach is consistent with established methodologies for gold standard establishment in AI diagnostic accuracy studies. Performance of a neuro-symbolic LLM system orchestrated via LangGraph v1.0 was compared against 10 general surgery specialists, 10 emergency medicine physicians, and 10 gastroenterology specialists from four tertiary centers in Turkey. The neuro-symbolic system incorporated the Tokyo Guidelines 2018 (TG18) as its symbolic knowledge base for diagnostic criteria, severity grading, and management algorithms. Results: The neuro-symbolic system attained the highest overall accuracy rate of 96.7% (29/30), markedly surpassing the performance of general surgery specialists (average 82.3% ± 6.8%), emergency medicine physicians (average 71.0% ± 8.2%), and gastroenterology specialists (average 78.7% ± 7.4%). Furthermore, the neuro-symbolic system exhibited superior performance across all clinical categories. Among human participants, general surgeons showed the highest accuracy in treatment decisions (88.0%), while gastroenterologists excelled in diagnostic questions (82.0%). Emergency medicine physicians showed comparable performance to other specialties in acute presentation scenarios. ROC analysis revealed excellent discrimination for the neuro-symbolic system (AUC = 0.983) compared to general surgery (AUC = 0.856), gastroenterology (AUC = 0.821), and emergency medicine (AUC = 0.764). Conclusions: The neuro-symbolic LLM system exhibited superior performance in standardized guideline-concordant case-based assessment of acute cholecystitis management compared to all human expert groups, reflecting its consistent application of encoded guideline criteria. These findings support its potential role as a clinical decision-support tool that augments, rather than replaces, physician expertise. The system's consistent application of standardized guidelines indicates its potential utility as a clinical decision support tool, particularly in settings where specialist expertise is limited. However, these results should be interpreted within the constraints of a structured case-based evaluation and do not imply global clinical superiority over human experts.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。