Abstract
INTRODUCTION: This study evaluates the diagnostic performance of a large language model (LLM) in determining causes of death by comparing three different information sources. METHODS: A total of 150 consecutive adult in-hospital cadavers underwent postmortem CT and pathological autopsy (2009-2013). The diagnostic accuracy of Claude 3.5 Sonnet (Anthropic, San Francisco, California) was evaluated in determining both underlying and immediate causes of death using three different information sources (clinical history alone, postmortem CT findings alone as documented by radiologists in their reports, and their integration). For each case, the LLM provided a primary diagnosis and two differential diagnoses. The autopsy result was used as the reference standard to assess accuracy. RESULTS: For underlying causes, the integration of both sources achieved significantly higher accuracy (78.0%) compared with the clinical history alone (69.3%) or the CT findings alone (42.0%) (p<0.001). When considering either primary or differential diagnoses, the accuracy reached 84.7% with integrated sources, 78.0% with clinical history alone, and 58.7% with CT findings alone. For immediate causes, the integrated approach showed higher accuracy in the primary diagnosis (61.3%) than the clinical history alone (52.0%) and CT findings alone (46.7%) (p<0.001). Disease-specific diagnostic accuracy analyses revealed marked variations, with hematologic malignancies showing the most significant differences among information sources (clinical history: 78.9%, CT findings alone: 36.8%, integrated analysis: 85.7%; p=0.003). CONCLUSION: Integrating postmortem CT findings with clinical history enhances LLM-based cause-of-death determination accuracy, demonstrating the value of multiple information sources while highlighting opportunities for disease-specific diagnostic optimization.