Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation

利用大型语言模型从基于患者的癌症模型科学文本中提取知识:算法开发与验证

阅读:1

Abstract

Patient-derived cancer models (PDCMs) have emerged as indispensable tools in both cancer research and preclinical studies. The number of publications on PDCMs increased significantly in the last decade. Developments in Artificial Intelligence (AI), particularly Large Language Models (LLMs), hold promise for extracting knowledge from scientific texts at scale. This study investigates the use of LLM-based systems for automatically extracting PDCM-related entities from scientific texts. We evaluated two approaches: direct prompting and soft prompting using LLMs. For direct prompting, we manually create prompts to guide the LLMs to output PDCM-related entities from texts. The prompt consists of an instruction, definitions of entity types, gold examples and a query. We automatically train soft prompts - a novel line of research in this domain - as continuous vectors using machine learning approaches. Our experiments utilized state-of-the-art LLMs - proprietary GPT4-o and a series of open LLaMA3 family models. In our experiments, GPT4-o with direct prompts maintained competitive results. Our results demonstrate that soft prompting can effectively enhance the capabilities of smaller open LLMs, achieving results comparable to proprietary models. These findings highlight the potential of LLMs in domain-specific text extraction tasks and emphasize the importance of tailoring approaches to the task and model characteristics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。