A dual-mode large language model assistant for on-surface reactions via fine-tuning and retrieval-augmented generation

一种用于表面反应的双模大型语言模型助手,通过微调和检索增强生成来实现。

阅读:2

Abstract

Surface reactions underpin catalysis, nanomaterials, energy conversion, and molecular-scale fabrication, yet the field suffers from fragmented knowledge dispersed across unstructured literature, hindering systematic analysis and data-driven discovery. Existing chemical databases and language models inadequately capture the domain-specific semantics and experimental parameters unique to on-surface reactions. Here, we present an integrated framework that transforms the dispersed surface-chemistry literature into a structured, machine-readable knowledge and leverages it to develop a domain-specialized large language model (LLM) assistant for on-surface reactions. We curated and semantically screened hundreds of thousands of publications to construct the surface-chemistry corpus, from which we extracted 44 predefined reaction attributes across more than 44 000 studies of surface reactions. These structured records were used to build both a high-quality reaction database and a domain-specific question-answering dataset. On this basis, we developed a dual-mode LLM system that combines a parameter-efficient fine-tuned reasoning model with a dual-source retrieval-augmented generation (RAG) framework, enabling both deep inference and verifiable retrieval of experimental parameters. Evaluations demonstrate that the fine-tuned LLM outperforms existing chemistry-oriented language models on surface-chemistry question-answering, achieving a Bert-F1 score exceeding 0.8. Incorporation of the RAG framework further improves factual accuracy, completeness, and reasoning consistency by grounding responses in the retrieved literature and structured reaction data. Latent-space analyses reveal that domain-specific fine-tuning reorganizes internal representations toward task-oriented coherence. This work establishes a scalable pathway for converting fragmented surface-chemistry knowledge into an intelligent platform, paving the way toward data-driven prediction, experimental planning and automated reasoning in on-surface reactions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。