Abstract
BACKGROUND: Computed tomography (CT) reports for patients with bone metastases include key clinical details, but they are usually written as free text. Different doctors write reports in different ways, which makes it hard to extract information in a consistent way. Large language models (LLMs) may help pull out important details automatically. But using external models can raise data privacy and management concerns in hospitals. This study looked at whether proprietary models and locally deployed open-source LLMs can help extract key diagnostic information and create structured reports in settings where data privacy is important. METHODS: We analyzed 300 CT reports of metastatic bone tumors and evaluated seven large language models, including proprietary and open-source systems. Tasks included identification of pathological fractures and extraction of fracture and metastatic sites. Expert annotations served as the reference standard. Model performance was assessed using accuracy, recall, precision, F1 score, area under the curve (AUC), and Cohen’s Kappa. RESULTS: Across all three extraction tasks, accuracies ranged from 82% to 98%. Both proprietary and open-source models achieved high performance. Locally deployable models, including Qwen2-72B and WiNGPT2-9B, demonstrated performance levels within similar ranges to GPT-4o. CONCLUSION: The results show that open-source models can handle structured extraction of bone metastasis CT reports. Medically fine-tuned small-scale open-source models maintain stable performance while being more suitable for on-site deployment within healthcare institutions. This can help reduce data privacy risks. In the future, these models may help doctors make clinical decisions within health systems. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12913-026-14350-3.