Abstract
BACKGROUND: Persona prompting is widely used to steer large language models (LLMs), but its effects on safety-critical clinical reasoning are not well characterized. METHODS: We performed a two-by-two factorial in silico experiment crossing time-pressure framing (high versus low) with optimization target (safety-first versus lean-efficiency). We used 28 Japanese-language synthetic emergency department vignettes covering chest pain, abdominal pain, headache, and dyspnea. Four trap cases contained prespecified contraindication or sequencing rules. Each persona evaluated each vignette twice, yielding 224 independent runs. Outputs followed a fixed JavaScript Object Notation (JSON) schema and were scored for the number of proposed tests, entropy of the probability distribution across the top five differential diagnoses, discharge decisions, safety-net specificity, and contraindication or sequencing violations, with severity grading. RESULTS: High time-pressure framing reduced the number of proposed tests (beta = -1.05, p < 0.001) and diagnostic breadth (beta = -0.246, p < 0.001). Safety-first prompting increased proposed testing (beta = 1.32, p < 0.001) and diagnostic breadth (beta = 0.247, p < 0.001), with no significant interaction. Among discharge plans (36 of 224 runs), safety-first prompting improved safety-net specificity (mean 4.5 versus 2.6 on a five-point scale). Contraindication or sequencing violations occurred only in the high/lean condition (eight of 56 runs, 14.3%); in trap cases, violations were eight of eight under high/lean and zero of 24 in the other three conditions. CONCLUSIONS: Persona components predictably shifted simulated clinical reasoning. Time-pressure framing narrowed diagnostic search and reduced proposed testing, whereas safety-first prompting improved safety-netting and prevented severe trap-case violations outside the high/lean condition. Prompt-aware stress testing may help identify unsafe prompt configurations before clinical deployment.