Abstract
OBJECTIVES: Critical incident reporting systems (CIRS) collect narrative reports on medical errors, but emotional signals within these reports, potential indicators of perceived risk and systemic weakness, are rarely examined. This cross-sectional study applied large language model-based sentiment analysis to explore how emotional expression in CIRS data may support artificial intelligence-enhanced patient safety monitoring. METHODS: We analysed 11 056 anonymised German incident reports submitted between 2005 and 2024 using GPT-4 (Generative Pre-trained Transformer 4, gpt-4-turbo-2024-04-09, zero shot) to assign sentiment labels and quantify five emotions (fear, frustration, anger, sadness, guilt; scale 0-1). Emotional profiles were clustered (k-means) and thematic patterns extracted via Latent Dirichlet allocation. Associations were examined using non-parametric tests. RESULTS: Negative sentiment dominated (95.6%, 95% CI 94.9% to 96.2%). Fear (mean=0.63, SD=0.21) and frustration (mean=0.59, SD=0.19) were most prevalent. Emergency care settings showed higher fear (p<0.05) and guilt (p<0.001). Reports with strong emotional expression, especially fear, guilt and sadness, were less likely to receive formal feedback (43.1% (95% CI 41.7% to 44.5%) vs 48.1% (95% CI 46.5% to 49.7%); absolute difference=5.0 percentage points (95% CI 2.7 to 7.3); p=0.001). DISCUSSION: Emotion intensity did not consistently correlate with harm severity but was linked to care context and systemic complexity. Emotion clusters reflected distinct clinical and organisational patterns, from acute emergencies to procedural failures. CONCLUSION: Emotion-based analysis of incident reports provides insight into perceived burden and care context. Sentiment profiling may improve system interpretability and support emotion-sensitive safety culture and feedback. Leveraging large language models can reduce reviewer workload and enable more targeted triage of emotionally complex reports.