Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

服从不安全的临床指令:大型语言模型如何响应权威信号

阅读:1

Abstract

BACKGROUND: Large language models (LLMs) are being integrated into clinical environments where deference to authority can cause harm. Unlike hallucination or bias, obedience to unsafe instructions represents a distinct safety failure: following an explicit but harmful order. METHODS: We conducted a cross-sectional evaluation of 20 proprietary, open-source, and clinically tuned LLMs across 10,096,800 clinical decision scenarios, including synthetic vignettes with predefined safe versus unsafe options and real-world discharge recommendations reframed to include unsafe contradictory requests. Each scenario was presented under a neutral control or one of six Milgram-style social-pressure conditions (authority, responsibility transfer, urgency, threat, conformity, depersonalization), with or without a short mitigation cue instructing verification or escalation if unsafe. The primary outcome was the proportion of potentially harmful outputs, defined as selection or endorsement of an unsafe clinical decision. RESULTS: Across all runs, 1.18 million of 10.1 million outputs (11.7%) were harmful. Harmful decisions occurred in 16.6% of unmitigated versus 10.1% of mitigated conditions (absolute reduction, 6.5 percentage points; p < 0.001). In synthetic vignettes, harmful responses averaged 8.1% overall, declining from 10.6% to 7.2% with mitigation (difference, 3.4 percentage points; p < 0.001). In real-world discharge cases, harmful responses averaged 30.0%, decreasing from 46.6% to 24.5% with mitigation (difference, 22.1 percentage points; p < 0.001). Across all conditions, authority and responsibility-transfer cues elicited the highest harmful compliance, and control prompts the lowest; mitigation reduced rates but preserved this pattern. CONCLUSION: LLMs do not behave as neutral calculators in clinical contexts. When exposed to authority or responsibility-transfer cues, they exhibit consistent obedience to unsafe instructions. A brief safety reminder substantially reduces but does not eliminate this behavior.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。