Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

语言模型助力无机材料的数据增强型合成规划

阅读:1

Abstract

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine learning models trained on limited data sets, which constrains its generality. We demonstrate that language models (LMs) without task-specific fine-tuning can recall synthesis conditions reported in the scientific literature. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash, and Llama 4 Maverick achieve a Top-1 precursor prediction accuracy of up to 53.8% and a Top-5 performance of 66.8% on a held-out set of 1000 reactions. They also predict calcination and sintering temperatures with mean absolute errors of <126 °C, matching or surpassing specialized regression models. Ensembling these LMs further enhances predictive accuracy and reduces inference cost per prediction by up to 70%. Given the broad, cross-domain knowledge of LMs, we evaluate whether they enable knowledge transfer by training a transformer, SyntMTE, on 28,548 LM-generated reaction recipes. Compared to a model trained on literature-reported data, we find that a model trained solely on LM-generated data exhibits competitive performance (only 6% worse). Conversely, a model trained on both the LM-generated and literature-reported data improves performance by up to 4%. In a case study on Li(7)La(3)Zr(2)O(12) solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable and data-efficient inorganic synthesis planning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。