Abstract
Inorganic synthesis planning currently relies primarily on heuristic approaches or machine learning models trained on limited data sets, which constrains its generality. We demonstrate that language models (LMs) without task-specific fine-tuning can recall synthesis conditions reported in the scientific literature. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash, and Llama 4 Maverick achieve a Top-1 precursor prediction accuracy of up to 53.8% and a Top-5 performance of 66.8% on a held-out set of 1000 reactions. They also predict calcination and sintering temperatures with mean absolute errors of <126 °C, matching or surpassing specialized regression models. Ensembling these LMs further enhances predictive accuracy and reduces inference cost per prediction by up to 70%. Given the broad, cross-domain knowledge of LMs, we evaluate whether they enable knowledge transfer by training a transformer, SyntMTE, on 28,548 LM-generated reaction recipes. Compared to a model trained on literature-reported data, we find that a model trained solely on LM-generated data exhibits competitive performance (only 6% worse). Conversely, a model trained on both the LM-generated and literature-reported data improves performance by up to 4%. In a case study on Li(7)La(3)Zr(2)O(12) solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable and data-efficient inorganic synthesis planning.