Enhancing LLM-based medical decision-making by test-time knowledge acquisition

通过考试时知识获取来增强基于LLM的医疗决策能力

阅读:1

Abstract

PURPOSE: Medical decision-making (MDM) is a complex clinical reasoning process that requires the systematic integration of multidisciplinary knowledge and evidence. Current approaches based on large language models (LLMs) are constrained by their reliance on static training corpora and often exhibit limited domain-specific adaptation, which can compromise diagnostic accuracy and reliability. This study aims to overcome these limitations by developing a framework that enables LLMs to dynamically acquire and refine knowledge during test time, thereby enhancing the robustness and precision of MDM systems. METHODS: We propose a test-time optimization framework that refines a frozen LLM's diagnostic reasoning through test-time knowledge acquisition and integration. For each medical query, the model generates multiple trajectories that are synthesized into a pseudo reference answer, whose self-consistency score separates confident from unconfident cases. Confident cases enable reward-guided reflection to extract reliable diagnostic heuristics, while unconfident cases undergo unsupervised reflection to reveal reasoning gaps and uncertainty patterns. The extracted knowledge is continually incorporated into an evolving, capacity-controlled knowledge base through operations that add, modify, or merge knowledge. This updated knowledge base then guides subsequent inference, allowing the model to adapt its reasoning strategy during test time without updating any parameters. RESULTS: Experimental evaluations on three public medical decision-making benchmarks-MedQA, NEJMQA, and MMLU-Pro-Health-show that the proposed framework consistently improves the performance of the state-of-the-art LLM, DeepSeekv3.2 Exp 671B. For example, on the MMLU-Pro-Health dataset, our method achieved an average accuracy of 79.22%, surpassing DeepSeekv3.2 Exp 671B by 1.84 percentage points, thus demonstrating the effectiveness of the framework in enhancing diagnostic decision-making. CONCLUSION: By leveraging inference-time self-evaluation and experience accumulation, this work introduces a new paradigm for building reliable, adaptive, and context-aware medical AI systems. It underscores the critical role of continual knowledge evolution in advancing trustworthy artificial intelligence for clinical decision support and lays the foundation for future developments in dynamic and responsive medical reasoning tools.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。