ChatGPT and low back pain - Evaluating AI-driven patient education in the context of interventional pain medicine

ChatGPT与腰痛——评估人工智能驱动的患者教育在介入性疼痛医学中的应用

阅读：1

作者：Basharat,Ahmed,Shah,Rohan,Wilcox,Nick,Tur,Gurpaij,Tripati,Siddarth,Kansal,Prisha,Gandhi,Niveah,Pokuri,Sreekrishna,Chong,Gabby,Odonkor,Charles A,Varhabhatla,Narayana,Chow,Robert

期刊：		影响因子：
时间：	2025	起止号：	2025 Sep;4(3):100636
doi：	10.1016/j.inpm.2025.100636

Abstract

BACKGROUND: ChatGPT and other Large Language Models (LLMs) are not only being more readily integrated into healthcare but are also being utilized more frequently by patients to answer health-related questions. Given the increased utilization for this purpose, it is essential to evaluate and study the consistency and reliability of artificial intelligence (AI) responses. Low back pain (LBP) remains one of the most frequently seen chief complaints in primary care and interventional pain management offices. OBJECTIVE: This study assesses the readability, accuracy, and overall utility of ChatGPT's ability to address patients' questions concerning low back pain. Our aim is to use clinician feedback to analyze ChatGPT's responses to these common low back pain related questions, as in the future, AI will undoubtedly play a role in triaging patients prior to seeing a physician. METHODS: To assess AI responses, we generated a standardized list of 25 questions concerning low back pain that were split into five categories including diagnosis, seeking a medical professional, treatment, self-treatment, and physical therapy. We explored the influence of how a prompt is worded on ChatGPT by asking questions from a 4th grader to a college/reference level. One board certified interventional pain specialist, one interventional pain fellow, and one emergency medicine resident reviewed ChatGPT's generated answers to assess accuracy and clinical utility. Readability and comprehensibility were evaluated using the Flesch-Kincaid Grade Level Scale. Statistical analysis was performed to analyze differences in readability scores, word count, and response complexity. RESULTS: How a question is phrased influences accuracy in statistically significant ways. Over-simplification of queries (e.g. to a 4th grade level) degrades ChatGPT's ability to return clinically complete responses. In contrast, reference and neutral queries preserve accuracy without additional engineering. Regardless of how the question is phrased, ChatGPT's default register trends towards technical language. Readability remains substantially misaligned with health literacy standards. Verbosity correlates with prompt type, but not necessarily accuracy. Word count is an unreliable proxy for informational completeness or clinical correctness in AI outputs and most errors stem from omission, not commission. Importantly, ChatGPT does not frequently generate false claims. CONCLUSION: This analysis complicates the assumption that "simpler is better" in prompting LLMs for clinical education. Whereas earlier work in structured conditions suggested that plain-language prompts improved accuracy, our findings indicate that a moderate reading level, not maximal simplicity, yields the most reliable outputs in complex domains like pain. This study further supports that AI LLMs can be integrated into a clinical workflow, possibly through electronic health record (EHR) software.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

炎性小体

组蛋白修饰

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

细胞极性

铁代谢

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

琥珀酰化

细胞干性

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

MAIT 细胞

肠肝轴

丙酰化