Abstract
INTRODUCTION AND AIMS: Clinical fractures remain the primary cause of failure in dental all-ceramic restorations, highlighting the need to improve the mechanical performance and durability of ceramic material. This study aimed to develop a large language model (LLM)-based framework to automatically construct a structured database of dental ceramics and integrate it with machine learning (ML) to predict material properties and accelerate material design. METHODS: LLMs (Llama, Qwen, and DeepSeek) were employed to perform literature mining tasks, including text classification, information extraction from abstracts, and tabular data extraction. These processes were integrated into an automated pipeline to systematically extract and structure compositional and performance data from dental research articles. Ten ML algorithms were then trained using the curated database to establish predictive models of ceramic performance. RESULTS: In the classification task, a few-shot learning model with simple label prompts achieved an F1 score of 0.89. Fine-tuned LLMs achieved F1 scores exceeding 0.89 across various entity categories.ML models were developed to predict the classification of flexural strength, with the Extra Trees model performing best (F1 = 0.928), and external validation yielding F1 = 0.88. SHAP analysis identified ZrO₂ and SiO₂ as key contributor, and exhaustive search identified optimal compositional ranges. CONCLUSIONS: This study demonstrates an AI-based pipeline combining LLM-driven data extraction and ML modelling, offering a scalable and accurate approach for accelerating the discovery and optimization of dental ceramics and other dental materials. CLINICAL RELEVANCE: The findings underscore the potential of advanced LLMs and ML models in restorative dentistry and materials research.